On 06/21/2010 02:21 PM, Alix Pexton wrote:
On 20/06/2010 22:46, Alix Pexton wrote:
On 20/06/2010 21:37, Ellery Newcomer wrote:
On 06/20/2010 03:01 PM, Alix Pexton wrote:
On 19/06/2010 21:12, Alix Pexton wrote:
I've been sketching some grammar diagrams for D2.0, a little like
those
on JSON.org, and of course I didn't get far before I ran into
something
odd.


I think I will take the plunge and base my diagrams on the source of
DMD. After looking at the code in lexer.c, it does not seem as far
beyond my rusty old c++ parsing skills as I had expected! Massive
credit
to Walter for having a codebase that is as mature as DMD without it
turning into a labyrinth of preprocessor macros and cryptic
"comefrom"s.

This will mean however that my little project may take a little longer,
sigh...

A...

Do share. I've always been too lazy to read lexer.c, and from this
discussion, it sounds like there are a few spots where my own lexer
grammar is incorrect (or at least differs from dmd).


of course ^^

A...

Well, I think I have got my head around lexer.c now, and its various
peculiarities, like "000377." being a valid float (although not
according to my shiny new, limited edition copy of tDPL (fig2.2 p35)^^).

Oh wow. That's a sweet little diagram. Those dots are hard to see though.


The weirdness occurs because some of some corner cases are handled not
by the neat little state state machine that validates reals, but in the
scanner at the point where it recognises a number beginning with a zero.
The productions in lex.html represent the range of inputs that are
accepted by the state machine without taking into account that the
scanner rejects the sequence "._" (which makes sense as that is the
identifier "_" in the outer scope).

to hell with lexer.c. I'm not changing anything.


Andrei's analysis in tDPL also points out that 0xp0 is a valid hexfloat,
but a strict reading of lex.html would not allow it.

Overall the diagram for hexfloat is much simpler than the one for
decimalfloat, which I think will have to be split into 3 ><

A...

PS, octal must die!

I'll settle for modified syntax 0c123. But yeah.

Are your diagrams solely concerned with the lexer? Because I have a (messy) parser grammar which I'm a bit more confident about if you're interested.

Reply via email to