Re: Looking for champion - std.lang.d.lex

Tomek Sowiński Fri, 22 Oct 2010 14:35:21 -0700

Dnia 22-10-2010 o 21:48:49 Andrei Alexandrescu<seewebsiteforem...@erdani.org> napisał(a):

On 10/22/10 14:02 CDT, Tomek Sowiński wrote:

Interesting idea. Here's another: D will soon need bindings for CORBA,
Thrift, etc, so lexers will have to be written all over to grok
interface files. Perhaps a generic tokenizer which can be parametrized
with a lexical grammar would bring more ROI, I got a hunch D's templates
are strong enough to pull this off without any source code generation
ala JavaCC. The books I read on compilers say tokenization is a solved
problem, so the theory part on what a good abstraction should be is
done. What you think?

Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizergenerator.

I have in mind the entire implementation of a simple design, but neverhad the time to execute on it. The tokenizer would work like this:


alias Lexer!(
     "+", "PLUS",
     "-", "MINUS",
     "+=", "PLUS_EQ",
     ...
     "if", "IF",
     "else", "ELSE"
     ...
) DLexer;


Yes. One remark: native language constructs scale better for a grammar:

enum TokenDef : string {
    Digit = "[0-9]",
    Letter = "[a-zA-Z_]",
    Identifier = Letter~'('~Letter~'|'~Digit~')',
    ...
    Plus = "+",
    Minus = "-",
    PlusEq = "+=",
    ...
    If = "if",
    Else = "else",
    ...
}
alias Lexer!TokenDef DLexer;

BTW, there's a bug related:
http://d.puremagic.com/issues/show_bug.cgi?id=2950

Such a declaration generates numeric values DLexer.PLUS etc. andgenerates an efficient code that extracts a stream of tokens from astream of text. Each token in the token stream has the ID and the text.


All good ideas.

Comments, strings etc. can be handled in one of several ways but that'sa longer discussion.


The discussion's started anyhow. So what're the options?

--
Tomek

Re: Looking for champion - std.lang.d.lex

Reply via email to