On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote: > It has recursion in it but that's not sufficient to call it a recursive > descent parser any more than having a recursive implementation of the > factorial function. The important part is what it recurses through...
<snip> > In my opinion, the rule set I mentioned in my original post: > > grammar = OneOrMore(Word(alphas)) + Literal('end') > > ...should be translated (internally) to something like this: > > word = Word(alphas) > grammar = Forward() > grammar << ((word + grammar) | (word + Literal(end))) > > This allows a recursive search to find the string correct without any > need for "backtracking," if I understand what you mean by that. Couldn't > pyparsing have done something like this? > Dear JAVotAM - This was a great post! (Although I'm not sure the comment in the first paragraph was entirely fair - I know the difference between recursive string parsing and recursive multiplication.) You really got me thinking more about how this recursion actually behaves, especially with respect to elements such as OneOrMore. Your original question is really quite common, and up until now, my stock answer has been to use negative lookahead. The expression you posted is the first time I've seen this solution, and it *does* work. I was all set to write a cocky response on why your expression wouldn't work. I've seen it many times before, where people (usually coming from EBNF experience) implement listOfXs = OneOrMore(x) as: listOfXs = Forward() listOfXs << ( x + listOfXs | x ) Actually, what they usually write is: listOfXs << ( listOfXs + x ) but this sends pyparsing into a recursive tailspin. So I fired up SciTE and copy/pasted your code into the editor and ran it, and it worked just fine - this was a shock! I studied this for a few minutes, and then realized what was happening. First of all, I misread what you posted. You posted this: grammar << ((word + grammar) | (word + Literal(end))) which works. I *thought* you posted this: grammar << ((word + grammar) | word) + Literal(end) which doesn't work. In fact this behaves the same as your original post, except it iterates over the input string's words recursively, vs. repetition ins a for-loop, as is done by OneOrMore. So going back to your working version, I had to see why it works. Initially, the first term in the MatchFirst (the '|' operator creates MatchFirst instances) is just the same, and by grammar referencing itself, it just goes word by word through the input trying to find a match. I'll try to visualize the steps: level "First Second Third end" 1 word grammar 2 word grammar 3 word grammar 4 word grammar <- fails! 4 word end <- fails! (therefore level 3 grammar fails!) 3 word end <-- success!!! grammar has 2 options: to match a word followed by a grammar, or to match a word followed by 'end'. At 4 levels deep into the Forward's recursion, the first option fails, because after parsing "end" as the initial word, there is no more text to try to match against grammar. Level 4's Forward then also tries to match a word followed by 'end', but this fails for the same reason. So at this point, the 4th level Forward fails to match either of its options, so it throws its exception back up to level 3, indicating that the first alternative, word followed by grammar, failed. Level 3 then moves on to see if word followed by the literal 'end' matches, and it does - success! Here's where I am stuck now. In the original grammar that you posted, which you want to render into this behavior, grammar is defined as: grammar = OneOrMore(Word(alphas)) + Literal('end') Unfortunately, when the OneOrMore gets constructed, it does not have any visibility beyond knowing what is to be repeated. Again, here is the data structure that is being built: - And - OneOrMore - Word(alphas) - Literal('end') Only at the level of the And is there any awareness that the OneOrMore is followed by anything, let alone by something which could be misinterpreted as something matching the OneOrMore's repetition expression. Can you suggest a way I could generalize this, so that OneOrMore stops matching before it gets to 'end'? -- Paul -- http://mail.python.org/mailman/listinfo/python-list