Re: Request for comments: std.d.lexer

Brian Schott Wed, 30 Jan 2013 01:52:34 -0800

On Monday, 28 January 2013 at 21:03:21 UTC, Timon Gehr wrote:

Better, but still slow.

I implemented the various suggestions from a past thread and madethe lexer only work ubyte[] (to aviod phobos convertingeverything to dchar all the time) and gave the tokenizer instancea character buffer that it re-uses.


Results:

$ avgtime -q -r 200 ./dscanner --tokenCount../phobos/std/datetime.d


------------------------
Total time (ms): 13861.8
Repetitions    : 200
Sample mode    : 69 (90 ocurrences)
Median time    : 69.0745
Avg time       : 69.3088
Std dev.       : 0.670203
Minimum        : 68.613
Maximum        : 72.635
95% conf.int.  : [67.9952, 70.6223]  e = 1.31357
99% conf.int.  : [67.5824, 71.0351]  e = 1.72633
EstimatedAvg95%: [69.2159, 69.4016]  e = 0.0928836
EstimatedAvg99%: [69.1867, 69.4308]  e = 0.12207

If my math is right, that means it's getting 4.9 milliontokens/second now. According to Valgrind the only way to reallyimprove things now is to require that the input to the lexersupport slicing. (Remember the secret of Tango's XML parser...)The bottleneck is now on the calls to .idup to construct thetoken strings from slices of the buffer.

I guess that at some point

pure nothrow TokenType lookupTokenType(const string input)
might become a bottleneck. (DMD does not generate near-optimalstring switches, I think.)


Right now that's a fairly small box on KCachegrind.

Re: Request for comments: std.d.lexer

Reply via email to