On Tuesday, 5 April 2016 at 21:37:09 UTC, Walter Bright wrote:
On 4/5/2016 6:47 AM, Basile B. wrote:
Also lexing number doesn't need to be as accurate as the
front-end of the compiler (especially if the HL doesnt have a
token type for the
illegal "lexem".
That is an interesting design point. If I was doing a
highlighter, I'd highlight in red tokens that the compiler
would reject, meaning I'd do the accurate number lexing.
Lexing numbers correctly is not trivial, but since the compiler
lexer's implementation can be cut/pasted, it is trivial in
practice.
Even if when the most naive lexer see a number and consumes until
a blank, a symbol or an operator, it's clear that this can be
done:
http://i.imgur.com/ehjps04.png
Actually numbers is the only part of the D lexer where errors can
be detected.
There's no possible syntax errors otherwise.
But one thing I forget to say in my previous post is that lexing
can be "multi-pass". The D front-end does everything in a single
pass, for example it direclty detects tokPlusPlus or tokXorEqu,
but actually a multi pass lexer can work in 3 sub phases:
1/ split words
2/ detects token families in the words; identifier, keyword,
operator, etc.
3/ specialize tokens: tokOp.data == "++" -> tokPlusPlus