On Thursday, 2 August 2012 at 04:38:11 UTC, Walter Bright wrote:
That's just not going to produce a high performance lexer.
The way to do it is in the Lexer instance, have a value which
is the current Token instance. That way, in the normal case,
one NEVER has to allocate a token instance.
Only when lookahead is done is storage allocation required, and
that list should be held by Lexer and recycled as tokens get
consumed. This is how the dmd lexer works.
Doing one allocation per token is never going to scale to
trying to shove millions upon millions of lines of code through
it.
Which is exactly why I'm pointing out the current, poor approach.
Having a single array with contiguous Tokens for lookahead is
completely doable even when Token is a class with some simple
GC.malloc and emplace composition. I think SDC's Token class is
too big to be useful as a struct, you'd pretty much never want to
pass it anywhere by value.