On 8/1/2012 9:54 PM, Jonathan M Davis wrote:
Then just pass the same identifier table to the function which creates the
token range. That doesn't require another type.

You're still going to require another type, otherwise you'll have to duplicate the state in every token allocation, with resultant heavy memory and initialization costs.

Please keep in mind that a lexer is not something you just pass a few short strings to. It's very very very performance critical as all those extra instructions add up to long delays when you're shoving millions of lines of code into its maw.

For the same reason you're also not going to want the lexer putting pressure on the GC. It could bring your whole system down.

To get a high performance lexer, you're going to be counting the average number of instructions executed per input character. Each one counts. Shaving one off is a victory. You should also be thinking about memory cache access patterns.

Reply via email to