"rh0dium" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi all, > > I have a file which I need to parse and I need to be able to break it > down by sections. I know it's possible but I can't seem to figure this > out. > > The sections are broken by <> with one or more keywords in the <>. > But how do I say that <SECTIONn> stops at the start of the next > <SECTIONm>? >
See the attached working example - the comments and definition of dataLine show how this is done. This is something of a trick in pyparsing, but it is a basic characteristic of the pyparsing recursive descent parser. -- Paul data="""<SYSLIB> Sys Data Sys-Data asdkData Data <LOGLVS> Data Data Data Data <SOME SECTION> Data Data Data Data <NETLIST> Data Data Data Data <NET> """ from pyparsing import * # basic pyparsing version secLabel = Suppress("<") + OneOrMore(Word(alphas)) + Suppress(">") + LineEnd().suppress() # need to indicate which entries are *not* valid datalines - next secLabel, or end of string dataLine = ~secLabel + ~StringEnd() + restOfLine + LineEnd().suppress() # a data section is a section label, followed by zero or more data lines section = Group(secLabel + ZeroOrMore(dataLine)) # a config data contains one or more sections configData = OneOrMore(section) # parse the input data and print the results res = configData.parseString(data) print res # prints: # [['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data', 'Data', 'Data', 'Data'], ['SOME', 'SECTION', 'Data', 'Data', 'Data', 'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']] # enhanced version, constructing a ParseResults with dict-like access # (reuses previous expression definitions) # combine multiword keys into a single string # - want <SOME SECTION> to return 'SOME SECTION', not # 'SOME', 'SECTION' def joinKeyWords(s,l,t): return " ".join(t) secLabel.setParseAction(joinKeyWords) section = Group(secLabel + ZeroOrMore(dataLine)) configData = Dict(OneOrMore(section)) # parse the input data, and access the results by section name res = configData.parseString(data) print res print res["SYSLIB"] print res["SOME SECTION"] print res.keys() # prints: #[['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data', 'Data', 'Data', 'Data'], ['SOME SECTION', 'Data', 'Data', 'Data', 'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']] #['Sys Data', 'Sys-Data', 'asdkData', 'Data'] #['Data', 'Data', 'Data', 'Data'] #['LOGLVS', 'NET', 'NETLIST', 'SYSLIB', 'SOME SECTION'] -- http://mail.python.org/mailman/listinfo/python-list