On Tuesday, 5 April 2016 at 21:37:09 UTC, Walter Bright wrote:
On 4/5/2016 6:47 AM, Basile B. wrote:
Also lexing number doesn't need to be as accurate as the
front-end of the compiler (especially if the HL doesnt have a token type for the
illegal "lexem".

That is an interesting design point. If I was doing a highlighter, I'd highlight in red tokens that the compiler would reject, meaning I'd do the accurate number lexing.

Lexing numbers correctly is not trivial, but since the compiler lexer's implementation can be cut/pasted, it is trivial in practice.

Even if when the most naive lexer see a number and consumes until a blank, a symbol or an operator, it's clear that this can be done:

http://i.imgur.com/ehjps04.png

Actually numbers is the only part of the D lexer where errors can be detected.
There's no possible syntax errors otherwise.

But one thing I forget to say in my previous post is that lexing can be "multi-pass". The D front-end does everything in a single pass, for example it direclty detects tokPlusPlus or tokXorEqu, but actually a multi pass lexer can work in 3 sub phases:
1/ split words
2/ detects token families in the words; identifier, keyword, operator, etc.
3/ specialize tokens: tokOp.data == "++" -> tokPlusPlus

Reply via email to