Re: Looking for champion - std.lang.d.lex

Andrei Alexandrescu Fri, 22 Oct 2010 12:50:35 -0700

On 10/22/10 14:02 CDT, Tomek Sowiński wrote:

Dnia 22-10-2010 o 00:01:21 Walter Bright <newshou...@digitalmars.com>
napisał(a):

As we all know, tool support is important for D's success. Making
tools easier to build will help with that.

To that end, I think we need a lexer for the standard library -
std.lang.d.lex. It would be helpful in writing color syntax
highlighting filters, pretty printers, repl, doc generators, static
analyzers, and even D compilers.

It should:

1. support a range interface for its input, and a range interface for
its output
2. optionally not generate lexical errors, but just try to recover and
continue
3. optionally return comments and ddoc comments as tokens
4. the tokens should be a value type, not a reference type
5. generally follow along with the C++ one so that they can be
maintained in tandem

It can also serve as the basis for creating a javascript
implementation that can be embedded into web pages for syntax
highlighting, and eventually an std.lang.d.parse.

Anyone want to own this?


Interesting idea. Here's another: D will soon need bindings for CORBA,
Thrift, etc, so lexers will have to be written all over to grok
interface files. Perhaps a generic tokenizer which can be parametrized
with a lexical grammar would bring more ROI, I got a hunch D's templates
are strong enough to pull this off without any source code generation
ala JavaCC. The books I read on compilers say tokenization is a solved
problem, so the theory part on what a good abstraction should be is
done. What you think?

Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizergenerator.

I have in mind the entire implementation of a simple design, but neverhad the time to execute on it. The tokenizer would work like this:


alias Lexer!(
    "+", "PLUS",
    "-", "MINUS",
    "+=", "PLUS_EQ",
    ...
    "if", "IF",
    "else", "ELSE"
    ...
) DLexer;

Such a declaration generates numeric values DLexer.PLUS etc. andgenerates an efficient code that extracts a stream of tokens from astream of text. Each token in the token stream has the ID and the text.

Comments, strings etc. can be handled in one of several ways but that'sa longer discussion.


The undertaking is doable but nontrivial.


Andrei

Re: Looking for champion - std.lang.d.lex

Reply via email to