On Thursday, 8 March 2012 at 07:49:57 UTC, Jonathan M Davis wrote:
The lexer is going to need to take a range of dchar (which may or may not be an array), And while the lexer would need to operate on generic ranges of dchar, it would probably have to be special-cased for strings in a number of places

I know what you mean. I actually cut out ddmd's conversion stuff because I had glanced over phobos I saw plenty of functions designed for this! I must have intuited what you are saying. dmd does all conversion to char* prior to sending the buffer to the lexer. I doubt there's a reason to change this procedure, only to put that conversion code directly into module dmd.lexer instead.

The parser would then take a range of tokens and then output the AST in some form or other - it probably couldn't be range, but I'm not sure.

Dmd's AST is pretty idiosyncratic.
Example: class FuncDeclaration (function declaration ) has a bunch of named members:
{
Identifier ident; // the function's name
Parameter[] parameters; // its parameters
Statement frequire; // the in{} contract, if present
Statement fbody; // function body
etc.

Each one has its own name. I actually was working on how to turn it into a more iterable format, since if you want to edit the AST directly you're going to need to cursor down or up to the element you want. It's actually doable, but it's not a natural range-ish format. That's where I'm confused about the licensing issues, since I'm not sure if the particular object structure which gets parsed is also going to be in phobos or if it must remain GPL, which I'm not sure I want to continue using.


So, if you're not familiar with ranges, you probably have a fair bit of learning ahead of you, and you're probably going to have to make a number of changes to your lexer and parser (though the majority of it will probably be able to stay intact). Unfortunately, a proper article and tutorial on them is currently lacking in spite of the fact that Phobos uses them heavily. Fortunately however, in a book that Ali Çehreli is writing on D, he has a
chapter on ranges that should help get you started:

http://ddili.org/ders/d.en/ranges.html

But I'd suggest that you play around with ranges a fair bit (especially with strings) before trying to change what you have to use them. std.algorithm in particular makes heavy use of ranges. And it wouldn't surprise me at all if some portions of your lexer and parser really should be using some of Phobos' functions but isn't currently, because it's originally a port from C++. You should also make sure that you understand the basics of Unicode fairly well - especially with how they pertain to char, wchar, and dchar - since that will affect your ability to correctly translate code to use ranges as well as
properly optimize them.

It would probably help if other D developers who are more familiar with ranges took a look at what you have and maybe even helped you start adjusting your code, but I don't know how many will both have the time and be interested. If I have time, I'll probably start poking at it, but I don't know that I'll have
time any time soon, much as I'd like to.

Regardless, you need to familiarize yourself with ranges if you want to get the lexer and parser ready for inclusion in Phobos. And you really should familiarize yourself with them anyway, since they're heavily used in D code in general. Not being able to use ranges in D would be like not being able to use iterators in C++. You can program in it, but you'd be fairly crippled -
particularly when dealing with the standard library.

- Jonathan M Davis

Reply via email to