On 7/31/2011 5:57 AM, Jacob Carlborg wrote:
* Lexing and parsing:

Standard facilities for these tasks could be very useful. Perhaps D
could get its own dlex and dyacc or some such tools. Personally, I
prefer sticking to LL(1), but LALR is generally more convenient and
flexible, and thus I'd suggest something YACC/ANTLR-like.

(I know this doesn't have much to do with Phobos per se, but I figured
I'd mention it.)

I think someone is working on this.

I've started on a port of DMD's lexer (not really a port ;) ):

https://github.com/jmacdonagh/phobos/compare/master...std.lang.d.lexer

Basically, you give it some string (string, wstring, or dstring), and it gives you a range of tokens back. The token has the type, a slice of the input that corresponds to the token, line / column, and a value (e.g. an integer constant).

Some features I'm planning:

1. Support D1 and D2.
2. Warnings and errors returned in the tokens. For example, if you use an octal constant for D2 code, it will correctly return an integer constant token with some kind of warning flag set and a message. In terms of errors, if the lexer hits "0xz012", it will return an error token for the slice "0xz" and then start lexing an integer constant "012". No exceptions, easy peasy.
3. CTFEable. Although I'll probably have to wait till the next DMD release.
4. Support any kind of character range. Not sure if people want to lex something that's not a string/wstring/dstring.

I'm glad this was brought up. I remember Walter's post last year asking for this module, but the conversation seemed to kill the idea. I started on this just for the fun of it, but then doubted whether Phobos wanted it. I feel that a hand written lexer / parser is going to be faster than something generated, but maybe I'm old fashioned.

Anyway, Jim, if you want to do this I can move on to something else. If you want, I can continue on. I didn't see a branch in your repo so I'm not sure what you've done.

Reply via email to