On 2007-11-03, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality" ><[EMAIL PROTECTED]> wrote: >> It has recursion in it but that's not sufficient to call it a recursive >> descent parser any more than having a recursive implementation of the >> factorial function. The important part is what it recurses through... > ><snip> > >> In my opinion, the rule set I mentioned in my original post: >> >> grammar = OneOrMore(Word(alphas)) + Literal('end') >> >> ...should be translated (internally) to something like this: >> >> word = Word(alphas) >> grammar = Forward() >> grammar << ((word + grammar) | (word + Literal(end))) >> >> This allows a recursive search to find the string correct without any >> need for "backtracking," if I understand what you mean by that. Couldn't >> pyparsing have done something like this? >> > > Dear JAVotAM - > > This was a great post! (Although I'm not sure the comment in the > first paragraph was entirely fair - I know the difference between > recursive string parsing and recursive multiplication.) > > You really got me thinking more about how this recursion actually > behaves, especially with respect to elements such as OneOrMore. Your > original question is really quite common, and up until now, my stock > answer has been to use negative lookahead. The expression you posted > is the first time I've seen this solution, and it *does* work. > > I was all set to write a cocky response on why your expression > wouldn't work. I've seen it many times before, where people (usually > coming from EBNF experience) implement listOfXs = OneOrMore(x) as: > > listOfXs = Forward() > listOfXs << ( x + listOfXs | x )
> > Actually, what they usually write is: > > listOfXs << ( listOfXs + x ) > > but this sends pyparsing into a recursive tailspin. > > So I fired up SciTE and copy/pasted your code into the editor and ran > it, and it worked just fine - this was a shock! I studied this for a > few minutes, and then realized what was happening. First of all, I > misread what you posted. You posted this: > > grammar << ((word + grammar) | (word + Literal(end))) > > which works. I *thought* you posted this: > > grammar << ((word + grammar) | word) + Literal(end) > > which doesn't work. In fact this behaves the same as your original > post, except it iterates over the input string's words recursively, > vs. repetition ins a for-loop, as is done by OneOrMore. > > So going back to your working version, I had to see why it works. > Initially, the first term in the MatchFirst (the '|' operator creates > MatchFirst instances) is just the same, and by grammar referencing > itself, it just goes word by word through the input trying to find a > match. I'll try to visualize the steps: > > level "First Second Third end" > 1 word grammar > 2 word grammar > 3 word grammar > 4 word grammar <- fails! > 4 word end <- fails! > (therefore level 3 grammar fails!) > 3 word end <-- success!!! > > grammar has 2 options: to match a word followed by a grammar, or to > match a word followed by 'end'. At 4 levels deep into the Forward's > recursion, the first option fails, because after parsing "end" as the > initial word, there is no more text to try to match against grammar. > Level 4's Forward then also tries to match a word followed by 'end', > but this fails for the same reason. So at this point, the 4th level > Forward fails to match either of its options, so it throws its > exception back up to level 3, indicating that the first alternative, > word followed by grammar, failed. Level 3 then moves on to see if > word followed by the literal 'end' matches, and it does - success! > > Here's where I am stuck now. In the original grammar that you posted, > which you want to render into this behavior, grammar is defined as: > > grammar = OneOrMore(Word(alphas)) + Literal('end') Is there not an ambiguity in the grammar? In EBNF: goal --> WORD { WORD } END WORD is '[a-zA-Z]+' END is 'end' I think it is fine that PyParsing can't guess what the composer of that grammar meant. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list