Oh, the pyparsing rendition of your initial pat expression would be something like:
import pyparsing as pp pat = pp.Combine( pp.oneOf("$ 0x 0X") + pp.Word(pp.hexnums,max=8) ) Combine is needed to ensure that the leading $, 0x, or 0X is immediately followed by 1-8 (and no more than 8) hex digits. Otherwise, pyparsing is pretty tolerant of whitespace cropping up wherever. As for some of your other syntaxes: I'm not sure what "Vre" means. I found that "Alternative" needs to support both greedy and non-greedy matches, so I provided Or and MatchFirst, respectively. They are also definable using '^' and '|' operators, again respectively. Finally, I ran into Literal("this") | Literal("that") | Literal("other") so often that I just added a helper method oneOf that would take the string "this that other" and build the right expression out of it. This too is non-trivial, as you have to take care that some short literals may mask longer ones in the list, as in oneOf("< = > <= >= !="). Just replacing this directly with Literal("<") | Literal("=") | ... would prevent any matching of the ">=" or "<=" literals. You could replace with the Or (^) form, but this exhaustively checks all alternatives all the time, a regrettable run-time performance penalty. Pyparsing's implementation of oneOf leaves the literals in the given order, unless a duplicate is given, or an earlier literal masks a later one - in that case, the longer literal is moved ahead of the shorter. I implemented Optional as a wrapper-type class, as opposed to the .optional() method that you have given - I'd say there are tradeoffs either way, just making the comparison. Your "repeated" or "times" seem to map roughly to pyparsing's OneOrMore and ZeroOrMore. Any thought how a recursive grammar might look? I don't find 'Interval' to be very easy on the eyes. In this case, I stole^H^H^H^H^H borrowed the re form of "[A-Za-z0-9]", providing a method named srange ("s" is for "string") such that srange("a-fA-F") would return the string "abcdefABCDEF". The other end of this process has to do with how the calling program will process the parsed results. Once a grammar gets too deeply nested, or has too many Optional elements, just returning a simple list or nested list of tokens isn't enough. Pyparsing returns ParseResults objects, which can be accessed as a list, dictionary, or object with attributes (provided individual fields have been given names at grammar definition time). I *have* had some complaints about ParseResults ("ParseResults are evil"), but the named access is a life-saver for complex grammars. (Simple case, the first token for your hex number is an optional sign - without names, you can't just access field 2, say, of the expression, you have to first test to see if the sign was provided or not, and then access field 2 or 3 accordingly. On the other hand, if you had given field 2 a name, your parser would be more robust, even you later changed your grammar to include other elements, such as a leading, um, currency symbol or something.) Just some fodder for your reverb considerations... -- Paul -- http://mail.python.org/mailman/listinfo/python-list