Reading your original post again and thinking about this a bit more... If someone can help me get up to speed with TMP in D, I could probably put together a proof of concept pretty quickly. Aside from the D syntax, it is all in the design (which is why I would like to discuss it in more detail). For anyone who is interested, here is how I go about DFA production currently:
- Bog standard tokenisation of regex tokens - Normalise each regex charset and enumerate in a map - build a regex syntax tree with leaf nodes storing the enumerated charset id - Intersect all the charsets and build a vector of equivalent charsets - Perform the followset to DFA transform as described in the Dragon book For my lexer generator, I build a lookup table for each character to an equivalence class. For my wild card matcher, I leave the representation as characters and do a linear lookup of chars to 'next' state. The lexer equivalence class method (a la flex) is more efficient as you just have two lookups per character with no linear searches. I'm thinking I could look at getting as far as the regex syntax tree and then we could discuss DFA representation (it sounds like you want to generate code directly - how do you do that?! This is why I mentioned the .NET bytecode stuff - I imagine there is some overlap there, but I don't know enough to be sure. Can you output source code at compilation time and have D compile it, or do you use a load of recursion like C++ TMP? And I forgot to mention - boost.spirit uses my lexertl library. Regards, Ben