On 06/10/13 18:07, Andrei Alexandrescu wrote:
I'm working on related code, and got all the way there in one day (Friday) with
a C++ tokenizer for linting purposes (doesn't open #includes or expand #defines
etc; it wasn't meant to).

The core generated fragment that does the matching is at https://dpaste.de/GZY3.

The surrounding switch statement (also in library code) handles whitespace and
line counting. The client code needs to handle by hand things like parsing
numbers (note how the matcher stops upon the first digit), identifiers, comments
(matcher stops upon detecting "//" or "/*") etc. Such things can be achieved
with hand-written code (as I do), other similar tokenizers, DFAs, etc. The point
is that the core loop that looks at every character looking for a lexeme is 
fast.

What I'm getting at is that I'd be prepared to give a vote "no to std, yes to etc" for Brian's d.lexer, _if_ I was reasonably certain that we'd see an alternative lexer module submitted to Phobos within the next month :-)

Reply via email to