Nobody wrote:
What I want: a tokeniser generator which can take a lex-style grammar (not necessarily lex syntax, but a set of token specifications defined by REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of bytes. It must allow the syntax to be defined at run-time.
You might find my Plex package useful: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ It was written some time ago, so it doesn't know about the new bytes type yet, but it shouldn't be hard to adapt it for that if you need to.
What I don't want: anything written by someone who doesn't understand the field (i.e. anything which doesn't use a DFA).
Plex uses a DFA. -- Greg -- http://mail.python.org/mailman/listinfo/python-list