"rh0dium" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Hi all,
>
> I have a file which I need to parse and I need to be able to break it
> down by sections.  I know it's possible but I can't seem to figure this
> out.
>
>     The sections are broken by <> with one or more keywords in the <>.
> But how do I say that <SECTIONn> stops at the start of the next
> <SECTIONm>?
>

See the attached working example - the comments and definition of dataLine
show how this is done.

This is something of a trick in pyparsing, but it is a basic characteristic
of the pyparsing recursive descent parser.

-- Paul

data="""<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>
"""

from pyparsing import *

# basic pyparsing version
secLabel = Suppress("<") + OneOrMore(Word(alphas)) + Suppress(">") +
LineEnd().suppress()
# need to indicate which entries are *not* valid datalines - next secLabel,
or end of string
dataLine = ~secLabel + ~StringEnd() + restOfLine + LineEnd().suppress()

# a data section is a section label, followed by zero or more data lines
section = Group(secLabel + ZeroOrMore(dataLine))

# a config data contains one or more sections
configData = OneOrMore(section)

# parse the input data and print the results
res = configData.parseString(data)
print res

# prints:
# [['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS',
'Data', 'Data', 'Data', 'Data'], ['SOME', 'SECTION', 'Data', 'Data', 'Data',
'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]


# enhanced version, constructing a ParseResults with dict-like access
# (reuses previous expression definitions)

# combine multiword keys into a single string
# - want <SOME SECTION> to return 'SOME SECTION', not
# 'SOME', 'SECTION'
def joinKeyWords(s,l,t):
    return " ".join(t)
secLabel.setParseAction(joinKeyWords)
section = Group(secLabel + ZeroOrMore(dataLine))
configData = Dict(OneOrMore(section))

# parse the input data, and access the results by section name
res = configData.parseString(data)
print res
print res["SYSLIB"]
print res["SOME SECTION"]
print res.keys()


# prints:
#[['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data',
'Data', 'Data', 'Data'], ['SOME SECTION', 'Data', 'Data', 'Data', 'Data'],
['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
#['Sys Data', 'Sys-Data', 'asdkData', 'Data']
#['Data', 'Data', 'Data', 'Data']
#['LOGLVS', 'NET', 'NETLIST', 'SYSLIB', 'SOME SECTION']



-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to