Ellery Newcomer wrote:
On 06/18/2010 06:36 AM, Ben Hanson wrote:
Reading your original post again and thinking about this a bit more...

If someone can help me get up to speed with TMP in D, I could probably put together a proof of concept pretty quickly. Aside from the D syntax, it is all in the design (which is why I would like to discuss it in more detail). For anyone who is interested, here is how I go about DFA production currently:

- Bog standard tokenisation of regex tokens
- Normalise each regex charset and enumerate in a map
- build a regex syntax tree with leaf nodes storing the enumerated charset id
- Intersect all the charsets and build a vector of equivalent charsets
- Perform the followset to DFA transform as described in the Dragon book

For my lexer generator, I build a lookup table for each character to an
equivalence class. For my wild card matcher, I leave the representation as
characters and do a linear lookup of chars to 'next' state. The lexer
equivalence class method (a la flex) is more efficient as you just have two
lookups per character with no linear searches.

I'm thinking I could look at getting as far as the regex syntax tree and then we could discuss DFA representation (it sounds like you want to generate code directly - how do you do that?! This is why I mentioned the .NET bytecode stuff - I imagine there is some overlap there, but I don't know enough to be sure. Can you output source code at compilation time and have D compile it, or do you
use a load of recursion like C++ TMP?

And I forgot to mention - boost.spirit uses my lexertl library.

Regards,

Ben

I've always understood that dfas have an exponential lower bound relative to .. something .. but I don't believe I have ever actually used a dfa generator (or written one, sadly enough). What has your experience been on dfa compile time?

I think only backtracking engines (not DFAs) have the exponential run time problem. Regular expression grammar is itself context free (so it's parsable efficiently), the NFA and NFA->DFA algorithms are polynomial, and the generated DFA makes one transition per input character.


Andrei

Reply via email to