On Thursday, 8 March 2012 at 07:49:57 UTC, Jonathan M Davis wrote:
The lexer is going to need to take a range of dchar (which may
or may not be an array),
And while the lexer would need to operate on generic ranges of
dchar, it would probably have to be special-cased for strings
in a number of places
I know what you mean. I actually cut out ddmd's conversion stuff
because I had glanced over phobos I saw plenty of functions
designed for this! I must have intuited what you are saying. dmd
does all conversion to char* prior to sending the buffer to the
lexer. I doubt there's a reason to change this procedure, only to
put that conversion code directly into module dmd.lexer instead.
The parser would then take a range of tokens and then output
the AST in some form or other - it probably couldn't be
range, but I'm not sure.
Dmd's AST is pretty idiosyncratic.
Example: class FuncDeclaration (function declaration ) has a
bunch of named members:
{
Identifier ident; // the function's name
Parameter[] parameters; // its parameters
Statement frequire; // the in{} contract, if present
Statement fbody; // function body
etc.
Each one has its own name. I actually was working on how to turn
it into a more iterable format, since if you want to edit the AST
directly you're going to need to cursor down or up to the element
you want. It's actually doable, but it's not a natural range-ish
format. That's where I'm confused about the licensing issues,
since I'm not sure if the particular object structure which gets
parsed is also going to be in phobos or if it must remain GPL,
which I'm not sure I want to continue using.
So, if you're not familiar with ranges, you probably have a
fair bit of
learning ahead of you, and you're probably going to have to
make a number of
changes to your lexer and parser (though the majority of it
will probably be
able to stay intact). Unfortunately, a proper article and
tutorial on them is
currently lacking in spite of the fact that Phobos uses them
heavily.
Fortunately however, in a book that Ali Çehreli is writing on
D, he has a
chapter on ranges that should help get you started:
http://ddili.org/ders/d.en/ranges.html
But I'd suggest that you play around with ranges a fair bit
(especially with
strings) before trying to change what you have to use them.
std.algorithm in
particular makes heavy use of ranges. And it wouldn't surprise
me at all if
some portions of your lexer and parser really should be using
some of Phobos'
functions but isn't currently, because it's originally a port
from C++. You
should also make sure that you understand the basics of Unicode
fairly well -
especially with how they pertain to char, wchar, and dchar -
since that will
affect your ability to correctly translate code to use ranges
as well as
properly optimize them.
It would probably help if other D developers who are more
familiar with ranges
took a look at what you have and maybe even helped you start
adjusting your
code, but I don't know how many will both have the time and be
interested. If
I have time, I'll probably start poking at it, but I don't know
that I'll have
time any time soon, much as I'd like to.
Regardless, you need to familiarize yourself with ranges if you
want to get
the lexer and parser ready for inclusion in Phobos. And you
really should
familiarize yourself with them anyway, since they're heavily
used in D code in
general. Not being able to use ranges in D would be like not
being able to use
iterators in C++. You can program in it, but you'd be fairly
crippled -
particularly when dealing with the standard library.
- Jonathan M Davis