"the.theorist" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hey, I'm trying my hand and pyparsing a log file (named l.log): > FIRSTLINE > > PROPERTY1 DATA1 > PROPERTY2 DATA2 > > PROPERTYS LIST > ID1 data1 > ID2 data2 > > ID1 data11 > ID2 data12 > > SECTION > > So I wrote up a small bit of code (named p.py): > from pyparsing import * > import sys > > toplevel = Forward() > > firstLine = Word('FIRSTLINE') > property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2') > + Word(alphanums)) > > id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') + > Word(alphanums)) > plist = Word('PROPERTYS LIST') + ZeroOrMore( id ) > > toplevel << firstLine > toplevel << OneOrMore( property ) > toplevel << plist > > par = toplevel > > print toplevel.parseFile(sys.argv[1]) > > The problem is that I get the following error: <snip> > Is this a fundamental error, or is it just me? (I haven't yet tried > simpleparse) >
It's you. Well, let's focus on the behavior and not the individual. There are two major misconceptions that you have here: 1. Confusing "Word" for "Literal" 2. Confusing "<<" Forward assignment for some sort of C++ streaming operator. What puzzles me is that in some places, you correctly use the Word class, as in Word(alphanums), to indicate a "word" as a contiguous set of characters found in the string alphanums. You also correctly use '+' to build up id and plist expressions, but then you use "<<" successively in what looks like streaming into the toplevel variable. When your grammar includes Word("FIRSTLINE"), you are actually saying you want to match a "word" composed of one ore more letters found in the string "FIRSTLINE" - this would match not only FIRSTLINE, but also FIRST, LINE, LIRST, FINE, LIST, FIST, FLINTSTRINE, well, you get the idea. Just the way Word(alphanums) matches DATA1, DATA2, data1, data2, data11, and data12. What you really want here is the class Literal, as in Literal("FIRSTLINE"). As for toplevel, there is no reason here to use Forward() - reserve use of this class for recursive structures, such as lists composed of lists, etc. toplevel is simply the sequence of a firstline, OneOrMore properties, and a plist, which is just the plain old: toplevel = firstline + OneOrMore(property) + plist Lastly, if you'll peruse the documentation that comes with pyparsing, you'll also find the Group class. This class is very helpful in imparting some structure to the returned set of tokens. Here is a before/after version of your program, that has some more successful results. -- Paul data = """FIRSTLINE PROPERTY1 DATA1 PROPERTY2 DATA2 PROPERTYS LIST ID1 data1 ID2 data2 ID1 data11 ID2 data12 SECTION """ from pyparsing import * import sys #~ toplevel = Forward() #~ firstLine = Word('FIRSTLINE') firstLine = Literal('FIRSTLINE') #~ property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2') + Word(alphanums)) property = (Literal('PROPERTY1') + Word(alphanums)) ^ (Literal('PROPERTY2') + Word(alphanums)) #~ id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') + Word(alphanums)) id = (Literal('ID1') + Word(alphanums)) ^ (Literal('ID2') + Word(alphanums)) #~ plist = Word('PROPERTYS LIST') + ZeroOrMore( id ) plist = Literal('PROPERTYS LIST') + ZeroOrMore( id ) #~ toplevel << firstLine #~ toplevel << OneOrMore( property ) #~ toplevel << plist toplevel = firstLine + OneOrMore( property ) + plist par = toplevel print par.parseString(data) # add Groups, to give structure to results, rather than just returning a flat list of strings plist = Literal('PROPERTYS LIST') + ZeroOrMore( Group(id) ) toplevel = firstLine + Group(OneOrMore(Group(property))) + Group(plist) par = toplevel print par.parseString(data) -- http://mail.python.org/mailman/listinfo/python-list