On 8/1/2012 4:18 PM, Jakob Ovrum wrote:
  * Currently files are read in their entirety first, then parsed. It is worth
exploring the idea of reading it in chunks lazily.

Using an input range will take care of that nicely.

  * The current result (TokenStream) is a wrapper over a GC-allocated array of
Token class instances, each instance with its own GC allocation (new Token). It
is worth exploring an alternative allocation strategy for the tokens.

That's just not going to produce a high performance lexer.

The way to do it is in the Lexer instance, have a value which is the current Token instance. That way, in the normal case, one NEVER has to allocate a token instance.

Only when lookahead is done is storage allocation required, and that list should be held by Lexer and recycled as tokens get consumed. This is how the dmd lexer works.

Doing one allocation per token is never going to scale to trying to shove millions upon millions of lines of code through it.

Reply via email to