On 10/06/13 10:57, Jacob Carlborg wrote: > On 2013-10-05 19:52, Artur Skawina wrote: > >> The assumption, that a hand-written lexer will be much faster than a >> generated >> one, is wrong. > > I never said that the generated one would be slow. I only said that the hand > written would be fast :)
I know, but you said that having both is an option -- that would not make sense unless there's a significant advantage. A lexer is really a rather trivial piece of software, there's not much room for improvement over the obvious "fetch-a-character, use-it-to- determine-a-new-state, repeat-until-done, return the found state ( == matched token)" approach. So the core of an efficient hand-written lexer will not be very different from this: http://repo.or.cz/w/girtod.git/blob/refs/heads/lexer:/mainloop.d That is already ~2kLOC and it's *just* the top-level loop; it does not include handling of nontrivial tokens (matches just keywords, punctuators and identifiers). Could a handwritten lexer be faster? Not by much, and any trick that would help the manually-written one could also be used by the generator. In fact, working on the generator is much easier than dealing with this kind of fragile hand-tuned mess. Imagine changing the lexical grammar a bit, or introducing a new kind of literal. With a more declarative solution this only involves a local change spanning a few lines and is relatively risk-free. Updating a handwritten lexer would involve many more changes, often in several different areas, and lots of opportunities for making mistakes. > Would it be able to lex Scala and Ruby? Method names in Scala can contain > many symbols that is not usually allowed in other languages. You can have a > method named "==". In Ruby method names are allowed to end with "=", "?" or > "!". Yes, D makes it easy, you can for example simply define a function that determines what is and what isn't an identifier and pass that as an alias or mixin parameter. "Lexing" binary formats would be possible too :^). A complete D lexer can look as simple as this: http://repo.or.cz/w/girtod.git/blob/refs/heads/lexer:/dlanglexer.d which should also give you a good idea of how easy supporting other languages would be. (The "actions" are defined in separate modules, so that the grammars can be reused everywhere). There's a D PEG lexical grammar in there too, btw. I forgot to change the subject previously, sorry; was not trying to attempt or influence the voting. I'm just saying that Andrei's approach goes into the right direction (even if i disagree with the details). And IMHO the time before a useful std-lib-worthy lexer infrastructure materializes is measured in months, if not years. So if I was voting I'd probably say "yes" - because waiting for a better, but non-existent alternative is not going to help anybody. The hard part of the required work isn't coding - it's the design. If a better solution appears later, it should be able to /replace/ the hand-written one. And in the mean time, the experience from using the less-generic lexer can only help any "new" design. artur