A found some clues on lexing using the re module in Python in an article by Martin LĂ·wis.
http://www.python.org/community/sigs/retired/parser-sig/towards-standard/ He writes: [...] A scanner based on regular expressions is usually implemented as an alternative of all token definitions. For XPath, a fragment of this expressions looks like this: (?P<Number>\\d+(\\.\\d*)?|\\.\\d+)| (?P<VariableReference>\\$""" + QName + """)| (?P<NCName>"""+NCName+""")| (?P<QName>"""+QName+""")| (?P<LPAREN>\\()| Here, each alternative in the regular expression defines a named group. Scanning proceeds in the following steps: 1. Given the complete input, match the regular expression with the beginning of the input. 2. Find out which alternative matched. [...] Item 2 is where I get stuck. There doesn't seem to be an obvious way to do it, which I understand is a bad thing in Python. Whatever source code went with the article originally is not linked from the above page, so I don't know what Martin did. Here's what I came up with (with a trivial example regex): import re r = re.compile('(?P<x>x+)|(?P<a>a+)') m = r.match('aaxaxx') if m: for k in r.groupindex: if m.group(k): # Find the token type. token = (k, m.group()) I wish I could do something obvious instead, like m.name(). -- Neil Cerutti After finding no qualified candidates for the position of principal, the school board is pleased to announce the appointment of David Steele to the post. --Philip Streifer -- http://mail.python.org/mailman/listinfo/python-list