Nobody wrote:

What I want: a tokeniser generator which can take a lex-style grammar (not
necessarily lex syntax, but a set of token specifications defined by
REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of
bytes. It must allow the syntax to be defined at run-time.

You might find my Plex package useful:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/

It was written some time ago, so it doesn't know about
the new bytes type yet, but it shouldn't be hard to
adapt it for that if you need to.

What I don't want: anything written by someone who doesn't understand the
field (i.e. anything which doesn't use a DFA).

Plex uses a DFA.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to