Re: std.d.lexer : voting thread

Andrei Alexandrescu Mon, 07 Oct 2013 17:21:28 -0700

On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:

On 10/2/13 7:41 AM, Dicebot wrote:

After brief discussion with Brian and gathering data from the review
thread, I have decided to start voting for `std.d.lexer` inclusion into
Phobos.


Thanks all involved for the work, first of all Brian.

I have the proverbial good news and bad news. The only bad news is that
I'm voting "no" on this proposal.

But there's plenty of good news.

1. I am not attempting to veto this, so just consider it a normal vote
when tallying.

2. I do vote for inclusion in the /etc/ package for the time being.

3. The work is good and the code valuable, so even in the case my
suggestions (below) will be followed, a virtually all code pulp that
gets work done can be reused.

[snip]

To put my money where my mouth is, I have a proof-of-concept tokenizerfor C++ in working state.


http://dpaste.dzfl.pl/d07dd46d

It contains some rather unsavory bits (I'm sure a ctRegex would be nicerfor parsing numbers etc), but it works on a lot of code just swell.

Most importantly, there's a clear distinction between the generic coreand the C++-specific part. It should be obvious how to use the genericmatcher for defining a D tokenizer.

Token representation is minimalistic and expressive. Just write tk!"<<"for left shift, tk!"int" for int etc. Typos will be detected duringcompilation. One does NOT need to define and use TK_LEFTSHIFT or TK_INT;all needed by the generic tokenizer is the list of tokens. In return, itoffers an efficient trie-based matcher for all tokens.

(Keyword matching is unusual in that keywords are first found by thetrie matcher, and then a simple check figures whether more charactersfollow, e.g. "if" vs. "iffy". Given that many tokenizers use a hashtableanyway to look up all symbols, there's no net loss of speed with thisapproach.)

The lexer generator compiles fast and should run fast. If not, it shouldbe easy to improve at the matcher level.

Now, what I'm asking for is that std.d.lexer builds on this designinstead of the traditional one. At a slight delay, we get the proverbialfishing rod IN ADDITION TO of the equally proverbial fish, FOR FREE. Itis quite evident there's a bunch of code sharing going on alreadybetween std.d.lexer and the proposed design, so it shouldn't be hard toeffect the adaptation.

So with this I'm leaving it all within the hands of the submitter andthe review manager. I didn't count the votes, but we may have a "yes"majority built up. Since additional evidence has been introduce, Isuggest at least a revote. Ideally, there would be enough motivation forBrian to suspend the review and integrate the proposed design withinstd.d.lexer.



Andrei

Re: std.d.lexer : voting thread

Reply via email to