On 2007-11-04, Just Another Victim of the Ambient Morality <[EMAIL PROTECTED]> wrote: >> Consider writing a recursive decent parser by hand to parse >> the language '[ab]+b'. >> >> goal --> ab_list 'b' >> ab_list --> 'a' list_tail >> ab_list --> 'b' list_tail >> list_tail --> 'a' list_tail >> list_tail --> 'b' list_tail >> list_tail --> null >> >> >> The above has the exact same bug (and probably some others--I'm >> sorry unable to test it just now) as the PyParsing solution. >> >> The error is in the grammar. It might be fixed by specifying that >> 'b' must be followed by EOF, and then it could be coded by using >> more than one character of lookahead. > > I don't exactly understand the syntax you used to describe the > productions of your recursive descent parser so not only did I not follow it > but I couldn't make out the rest of your post. Could you explain in a > little more detail? The last part that points to 'null' is especially > confusing...
It's the BNF spelling of goal --> ab_list 'b' ab_list --> ab { ab } ab --> 'a' | 'b' The null is to say that list_tail can match nothing, i.e, an empty string. Then, in the Parser class, every method (except for match, which is used as a central place to consume characters) corresponds to one of the productions in the BNF. Breaking things down into BNF-based productions often makes implementation, debugging and code generation easier. PyParsing saves me that stop, since I can often directly implement the EBNF using PyParsing. > As demonstrated earlier, it's not just the grammar. There are > situations that are unambiguous that pyparsing can't parse > simply and there's no reason for it. Yes, many parser generators have many more limitations than just the requirement of an unambiguous grammar. > Besides, ambiguous grammars are a fact of life and some of us > need to parse them. It's usually okay, too. Consider a > previous example: > > grammar = OneOrMore(Word(alphas)) + Literal('end') > > While you may consider this inherently ambiguous, it's usually > not. That is to say, as long as it is rare that 'end' is used > not at the end of the string, this will simply parse and, yet, > pyparsing will consistently fail to parse it... I believe there's no cure for the confusion you're having except for implementing a parser for your proposed grammar. Alternatively, try implementing your grammar in one of your other favorite parser generators. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list