On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:
On 10/2/13 7:41 AM, Dicebot wrote:
After brief discussion with Brian and gathering data from the review
thread, I have decided to start voting for `std.d.lexer` inclusion into
Phobos.

Thanks all involved for the work, first of all Brian.

I have the proverbial good news and bad news. The only bad news is that
I'm voting "no" on this proposal.

But there's plenty of good news.

1. I am not attempting to veto this, so just consider it a normal vote
when tallying.

2. I do vote for inclusion in the /etc/ package for the time being.

3. The work is good and the code valuable, so even in the case my
suggestions (below) will be followed, a virtually all code pulp that
gets work done can be reused.
[snip]

To put my money where my mouth is, I have a proof-of-concept tokenizer for C++ in working state.

http://dpaste.dzfl.pl/d07dd46d

It contains some rather unsavory bits (I'm sure a ctRegex would be nicer for parsing numbers etc), but it works on a lot of code just swell.

Most importantly, there's a clear distinction between the generic core and the C++-specific part. It should be obvious how to use the generic matcher for defining a D tokenizer.

Token representation is minimalistic and expressive. Just write tk!"<<" for left shift, tk!"int" for int etc. Typos will be detected during compilation. One does NOT need to define and use TK_LEFTSHIFT or TK_INT; all needed by the generic tokenizer is the list of tokens. In return, it offers an efficient trie-based matcher for all tokens.

(Keyword matching is unusual in that keywords are first found by the trie matcher, and then a simple check figures whether more characters follow, e.g. "if" vs. "iffy". Given that many tokenizers use a hashtable anyway to look up all symbols, there's no net loss of speed with this approach.)

The lexer generator compiles fast and should run fast. If not, it should be easy to improve at the matcher level.

Now, what I'm asking for is that std.d.lexer builds on this design instead of the traditional one. At a slight delay, we get the proverbial fishing rod IN ADDITION TO of the equally proverbial fish, FOR FREE. It is quite evident there's a bunch of code sharing going on already between std.d.lexer and the proposed design, so it shouldn't be hard to effect the adaptation.

So with this I'm leaving it all within the hands of the submitter and the review manager. I didn't count the votes, but we may have a "yes" majority built up. Since additional evidence has been introduce, I suggest at least a revote. Ideally, there would be enough motivation for Brian to suspend the review and integrate the proposed design within std.d.lexer.


Andrei

Reply via email to