On Thursday, August 02, 2012 23:41:39 Andrei Alexandrescu wrote: > On 8/2/12 11:08 PM, Jonathan M Davis wrote: > > You're not going to get as fast a lexer if it's not written specifically > > for D. Writing a generic lexer is a different problem. It's also one that > > needs to be solved, but I think that it's a mistake to think that a > > generic lexer is going to be able to be as fast as one specifically > > optimized for D. > > Do you have any evidence to back that up? I mean you're just saying it.
Because all of the rules are built directly into the code. You don't have to use regexes or anything like that. Pieces of the lexer could certainly be generic or copied over to other lexers just fine, but when you write the lexer by hand specifically for D, you can guarantee that it checks exactly what it needs to for D without any extra cruft or lost efficiency due to decoding where it doesn't need to or checking an additional character at any point or anything along those lines. And tuning it is much easier, because you have control over the whole thing. Also, given features such as token strings, I would think that using a generic lexer on D would be rather difficult anyway. If someone wants to try and write a generic lexer for D and see if they can beat out any hand-written ones, then more power to them, but I don't see how you could possibly expect to shave the operations down to the bare minimum necessary to get the job done with a generic lexer, whereas a hand-written parser can do that given enough effort. - Jonathan M Davis