Ellery Newcomer wrote:
On 06/18/2010 06:36 AM, Ben Hanson wrote:
Reading your original post again and thinking about this a bit more...
If someone can help me get up to speed with TMP in D, I could probably
put
together a proof of concept pretty quickly. Aside from the D syntax,
it is all
in the design (which is why I would like to discuss it in more
detail). For
anyone who is interested, here is how I go about DFA production
currently:
- Bog standard tokenisation of regex tokens
- Normalise each regex charset and enumerate in a map
- build a regex syntax tree with leaf nodes storing the enumerated
charset id
- Intersect all the charsets and build a vector of equivalent charsets
- Perform the followset to DFA transform as described in the Dragon book
For my lexer generator, I build a lookup table for each character to an
equivalence class. For my wild card matcher, I leave the
representation as
characters and do a linear lookup of chars to 'next' state. The lexer
equivalence class method (a la flex) is more efficient as you just
have two
lookups per character with no linear searches.
I'm thinking I could look at getting as far as the regex syntax tree
and then
we could discuss DFA representation (it sounds like you want to
generate code
directly - how do you do that?! This is why I mentioned the .NET
bytecode stuff
- I imagine there is some overlap there, but I don't know enough to be
sure.
Can you output source code at compilation time and have D compile it,
or do you
use a load of recursion like C++ TMP?
And I forgot to mention - boost.spirit uses my lexertl library.
Regards,
Ben
I've always understood that dfas have an exponential lower bound
relative to .. something .. but I don't believe I have ever actually
used a dfa generator (or written one, sadly enough). What has your
experience been on dfa compile time?
I think only backtracking engines (not DFAs) have the exponential run
time problem. Regular expression grammar is itself context free (so it's
parsable efficiently), the NFA and NFA->DFA algorithms are polynomial,
and the generated DFA makes one transition per input character.
Andrei