std.d.lexer requirements

Walter Bright Wed, 01 Aug 2012 17:15:25 -0700

Given the various proposals for a lexer module for Phobos, I thought I'd sharesome characteristics it ought to have.


First of all, it should be suitable for, at a minimum:


1. compilers

2. syntax highlighting editors

3. source code formatters

4. html creation

To that end:

1. It should accept as input an input range of UTF8. I feel it is a mistake totemplatize it for UTF16 and UTF32. Anyone desiring to feed it UTF16 should usean 'adapter' range to convert the input to UTF8. (This is what componentprogramming is all about.)


2. It should output an input range of tokens

3. tokens should be values, not classes

4. It should avoid memory allocation as much as possible

5. It should read or write any mutable global state outside of its "Lexer"
instance

6. A single "Lexer" instance should be able to serially accept input ranges,sharing and updating one identifier table

7. It should accept a callback delegate for errors. That delegate should decidewhether to:

   1. ignore the error (and "Lexer" will try to recover and continue)
   2. print an error message (and "Lexer" will try to recover and continue)
   3. throw an exception, "Lexer" is done with that input range

8. Lexer should be configurable as to whether it should collect informationabout comments and ddoc comments or not

9. Comments and ddoc comments should be attached to the next following token,they should not themselves be tokens


10. High speed matters a lot

11. Tokens should have begin/end line/column markers, though most of the timethis can be implicitly determined


12. It should come with unittests that, using -cov, show 100% coverage

Basically, I don't want anyone to be motivated to do a separate one after seeingthis one.

std.d.lexer requirements

Reply via email to