Official D Grammar

2013-04-01 Thread Brian Schott
I've pretty much finished up my work on the std.d.lexer module. I 
am waiting for the review queue to make some progress on the 
other (three?) modules being reviewed before starting a thread on 
it.


In the meantime I've started some work on an AST module for 
Phobos that contains the data types necessary to build up a 
parser module so that we can have a standard set of code build D 
dev tools off of. I decided to work directly from the standard on 
dlang.org for this to make sure that my module is correct and 
that the standard is actually correct.


I've seen several threads on this newsgroup complaining about the 
state of the standard and unfortunately this will be another one.


1) Grammar defined in terms of things that aren't tokens. Take, 
for example, PropertyDeclaration. It's defined as an "@" token 
followed by... what? "safe"? It's not a real token. It's an 
identifier. You can't parse this based on checking the token 
type. You have to check the type and the value.


2) Grammar references rules that don't exist. 
UserDefinedAttribute is defined in terms of CallExpression, but 
CallExpression doesn't exist elsewhere in the grammar. 
BaseInterfaceList is defined in terms of InterfaceClasses, but 
that rule is never defined.


3) Unnecessary rules. KeyExpression, ValueExpression, 
ScopeBlockStatement, DeclarationStatement, ThenStatement, 
ElseStatement, Test, Increment, Aggregate, LwrExpression, 
UprExpression, FirstExp, LastExp, StructAllocator, 
StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, 
ConstraintExpression, MixinIdentifier, etc... are all defined in 
terms of only one other rule.


I think that we need to be able to create a grammar description 
that:
* Fits in to a single file, so that a tool implementer does not 
need to collect bits of the grammar from the various pages on 
dlang.org.
* Can be verified to be correct by an existing tool such as 
Bison, Goldie, JavaCC,  with a small number 
of changes.
* Is part of the dmd/dlang repositories on github and gets 
updated every time the language changes.


I'm willing to work on this if there's a good chance it will 
actually be implemented. Thoughts?


Re: Official D Grammar

2013-04-01 Thread Walter Bright

On 4/1/2013 4:18 PM, Brian Schott wrote:

I've pretty much finished up my work on the std.d.lexer module. I am waiting for
the review queue to make some progress on the other (three?) modules being
reviewed before starting a thread on it.

In the meantime I've started some work on an AST module for Phobos that contains
the data types necessary to build up a parser module so that we can have a
standard set of code build D dev tools off of. I decided to work directly from
the standard on dlang.org for this to make sure that my module is correct and
that the standard is actually correct.

I've seen several threads on this newsgroup complaining about the state of the
standard and unfortunately this will be another one.

1) Grammar defined in terms of things that aren't tokens. Take, for example,
PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"?
It's not a real token. It's an identifier. You can't parse this based on
checking the token type. You have to check the type and the value.


True, do you have a suggestion?



2) Grammar references rules that don't exist. UserDefinedAttribute is defined in
terms of CallExpression, but CallExpression doesn't exist elsewhere in the
grammar. BaseInterfaceList is defined in terms of InterfaceClasses, but that
rule is never defined.


Yes, this needs to be fixed.



3) Unnecessary rules. KeyExpression, ValueExpression, ScopeBlockStatement,
DeclarationStatement, ThenStatement, ElseStatement, Test, Increment, Aggregate,
LwrExpression, UprExpression, FirstExp, LastExp, StructAllocator,
StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, ConstraintExpression,
MixinIdentifier, etc... are all defined in terms of only one other rule.


Using these makes documentation easier, and I don't think it harms anything.



I think that we need to be able to create a grammar description that:
* Fits in to a single file, so that a tool implementer does not need to collect
bits of the grammar from the various pages on dlang.org.
* Can be verified to be correct by an existing tool such as Bison, Goldie,
JavaCC,  with a small number of changes.
* Is part of the dmd/dlang repositories on github and gets updated every time
the language changes.

I'm willing to work on this if there's a good chance it will actually be
implemented. Thoughts?


I suggest doing this as a sequence of pull requests, not doing just one big one.


Re: Official D Grammar

2013-04-02 Thread Stewart Gordon

On 02/04/2013 00:18, Brian Schott wrote:


I think that we need to be able to create a grammar description that:
* Fits in to a single file, so that a tool implementer does not need to
collect bits of the grammar from the various pages on dlang.org.
* Can be verified to be correct by an existing tool such as Bison,
Goldie, JavaCC,  with a small number of changes.
* Is part of the dmd/dlang repositories on github and gets updated every
time the language changes.



Indeed, the published grammar needs to be thoroughly checked against 
what DMD is actually doing, and any discrepancies fixed (or filed in 
Bugzilla to be fixed in due course).  And then they need to be kept in sync.


Has the idea of using a parser generator to build D's parsing code been 
rejected in the past, or is hand-coding just the way Walter decided to 
do it?  Is the code any more efficient than what a typical parser 
generator would generate?


And all disambiguation rules (such as "if it's parseable as a 
DeclarationStatement, it's a DeclarationStatement") need to be made 
explicit as part of the grammar.  I suppose this is where using Bison or 
similar would help, as it would point out any ambiguities in the grammar 
that need rules to resolve them.


Stewart.


Re: Official D Grammar

2013-04-02 Thread Jacob Carlborg

On 2013-04-02 15:21, Stewart Gordon wrote:


Indeed, the published grammar needs to be thoroughly checked against
what DMD is actually doing, and any discrepancies fixed (or filed in
Bugzilla to be fixed in due course).  And then they need to be kept in
sync.

Has the idea of using a parser generator to build D's parsing code been
rejected in the past, or is hand-coding just the way Walter decided to
do it?  Is the code any more efficient than what a typical parser
generator would generate?

And all disambiguation rules (such as "if it's parseable as a
DeclarationStatement, it's a DeclarationStatement") need to be made
explicit as part of the grammar.  I suppose this is where using Bison or
similar would help, as it would point out any ambiguities in the grammar
that need rules to resolve them.


I'm wondering if it's possibly to mechanically check that what's in the 
grammar is how DMD behaves.


--
/Jacob Carlborg


Re: Official D Grammar

2013-04-02 Thread Tobias Pankrath


I'm wondering if it's possibly to mechanically check that 
what's in the grammar is how DMD behaves.


Take the grammar and (randomly) generate strings with it and 
check if DMD does complain. You'd need a parse only don't check 
semantics flag, though.


This will not check if the strings are parsed correctly by DMD 
nor if invalid strings are rejected. But it would be a start.




Re: Official D Grammar

2013-04-06 Thread Bruno Medeiros

On 02/04/2013 00:18, Brian Schott wrote:

I've pretty much finished up my work on the std.d.lexer module. I am
waiting for the review queue to make some progress on the other (three?)
modules being reviewed before starting a thread on it.

In the meantime I've started some work on an AST module for Phobos that
contains the data types necessary to build up a parser module so that we
can have a standard set of code build D dev tools off of. I decided to
work directly from the standard on dlang.org for this to make sure that
my module is correct and that the standard is actually correct.

I've seen several threads on this newsgroup complaining about the state
of the standard and unfortunately this will be another one.

1) Grammar defined in terms of things that aren't tokens. Take, for
example, PropertyDeclaration. It's defined as an "@" token followed
by... what? "safe"? It's not a real token. It's an identifier. You can't
parse this based on checking the token type. You have to check the type
and the value.

2) Grammar references rules that don't exist. UserDefinedAttribute is
defined in terms of CallExpression, but CallExpression doesn't exist
elsewhere in the grammar. BaseInterfaceList is defined in terms of
InterfaceClasses, but that rule is never defined.

3) Unnecessary rules. KeyExpression, ValueExpression,
ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement,
Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp,
LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType,
EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all
defined in terms of only one other rule.

I think that we need to be able to create a grammar description that:
* Fits in to a single file, so that a tool implementer does not need to
collect bits of the grammar from the various pages on dlang.org.
* Can be verified to be correct by an existing tool such as Bison,
Goldie, JavaCC,  with a small number of changes.
* Is part of the dmd/dlang repositories on github and gets updated every
time the language changes.

I'm willing to work on this if there's a good chance it will actually be
implemented. Thoughts?


Interesting thread. I've been working on a hand-written D parser (in 
Java, for the DDT IDE) and I too have found a slew of grammar spec 
issues. Some of them more serious than the ones you mentioned above. In 
same cases it's actually not clear, or downright wrong what the grammar 
spec says. For example, here's one off of my notes:


  void func(int foo() { } );

The spec says that is parsable (basically a function declaration in the 
parameter list), which makes no sense, and DMD doesn't accept. Some 
cases are a bit trickier, since it's not clear if the syntax should be 
accepted or not (sometimes they might make sense but not be allowed).


These issues make things a bit harder for tools development that require 
D language parsers. But the whole grammar spec is so messy, I've been 
unsure whether it's worth filling bug reports or not (would they be 
addressed?). There is also the problem that even if those issues are 
fixed now, the spec could very easily fall out of date in the future, 
unless we have some system to test the spec. Like you mentioned, ideally 
we would have a grammar spec for a grammar/PG tool so that correctness 
could more easily be verified.
(it doesn't guarantee no spec bugs, but it makes it much harder for them 
to be there)



--
Bruno Medeiros - Software Engineer


Re: Official D Grammar

2013-04-06 Thread Bruno Medeiros

On 02/04/2013 00:18, Brian Schott wrote:

I've pretty much finished up my work on the std.d.lexer module. I am
waiting for the review queue to make some progress on the other (three?)
modules being reviewed before starting a thread on it.



BTW, even in the lexer spec I've found an issue. How does this parse:
  5.blah
According to the spec (maximal munch technique), it should be FLOAT then 
IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming 
the lastest is the correct behavior, so you can write stuff like 
123.init, but that should be clarified.



--
Bruno Medeiros - Software Engineer


Re: Official D Grammar

2013-04-06 Thread Jonathan M Davis
On Saturday, April 06, 2013 16:21:12 Bruno Medeiros wrote:
> On 02/04/2013 00:18, Brian Schott wrote:
> > I've pretty much finished up my work on the std.d.lexer module. I am
> > waiting for the review queue to make some progress on the other (three?)
> > modules being reviewed before starting a thread on it.
> 
> BTW, even in the lexer spec I've found an issue. How does this parse:
> 5.blah
> According to the spec (maximal munch technique), it should be FLOAT then
> IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming
> the lastest is the correct behavior, so you can write stuff like
> 123.init, but that should be clarified.

It would definitely have to be INTEGER DOT IDENTIFIER due to UFCS, so it sounds 
like the spec wasn't updated like it should have been.

- Jonathan M Davis


Re: Official D Grammar

2013-04-06 Thread Artur Skawina
On 04/06/13 17:21, Bruno Medeiros wrote:
> On 02/04/2013 00:18, Brian Schott wrote:
>> I've pretty much finished up my work on the std.d.lexer module. I am
>> waiting for the review queue to make some progress on the other (three?)
>> modules being reviewed before starting a thread on it.
>>
> 
> BTW, even in the lexer spec I've found an issue. How does this parse:
>   5.blah
> According to the spec (maximal munch technique), it should be FLOAT then 
> IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the 
> lastest is the correct behavior, so you can write stuff like 123.init, but 
> that should be clarified.

"1..2", "1.ident" and a float literal with '_' after the '.' are the
DecimalFloat cases that I immediately ran into when doing a lexer based on
the dlang grammar. It's obvious to a human how these should be handled, but
code generators aren't that smart... But they are good at catching mistakes
like these. 
Actually, that last case is even more "interesting"; http://dlang.org/lex.html
has "1_2_3_4_5_6_._5_6_7_8" as a valid example, which of course it's not
("_5_6_7_8" is a valid identifier), but there is no reason do disallow
"1_2_3_4_5_6_.5_6_7_8".

> that should be clarified.

These are just grammar bugs, that could easily be fixed.  Then there are
some things that can be less obvious, but shouldn't really be controversial
like allowing empty HexString literals.

Then there's the enhancement category.
Looking through my comments, I think the only deliberate change from dlang.org
that I have is in DelimitedString -- there is no reason to forbid q"/abc/def/";
there are no back-compat issues, as it couldn't have existed in legacy D code.

artur


Re: Official D Grammar

2013-04-06 Thread Bruno Medeiros

On 06/04/2013 20:52, Artur Skawina wrote:

On 04/06/13 17:21, Bruno Medeiros wrote:

On 02/04/2013 00:18, Brian Schott wrote:

I've pretty much finished up my work on the std.d.lexer module. I am
waiting for the review queue to make some progress on the other (three?)
modules being reviewed before starting a thread on it.



BTW, even in the lexer spec I've found an issue. How does this parse:
   5.blah
According to the spec (maximal munch technique), it should be FLOAT then 
IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the 
lastest is the correct behavior, so you can write stuff like 123.init, but that 
should be clarified.


"1..2", "1.ident" and a float literal with '_' after the '.' are the
DecimalFloat cases that I immediately ran into when doing a lexer based on
the dlang grammar. It's obvious to a human how these should be handled, but
code generators aren't that smart... But they are good at catching mistakes
like these.


The "1..2" is actually mentioned in the spec:
"An exception to this rule is that a .. embedded inside what looks like 
two floating point literals, as in 1..2, is interpreted as if the .. was 
separated by a space from the first integer."

so it's there, even if it can be missed.

But unless I missed it, the spec is incorrect for the "1.ident" or 
"1_2_3_4_5_6_._5_6_7_8" cases as there is no exception mentioned 
there... and it's not always 100% obvious to a human how these should be 
handled. Or maybe that's just me :)


--
Bruno Medeiros - Software Engineer


Re: Official D Grammar

2013-04-07 Thread Artur Skawina
On 04/07/13 00:35, Bruno Medeiros wrote:
> On 06/04/2013 20:52, Artur Skawina wrote:
>> On 04/06/13 17:21, Bruno Medeiros wrote:
>>> On 02/04/2013 00:18, Brian Schott wrote:
 I've pretty much finished up my work on the std.d.lexer module. I am
 waiting for the review queue to make some progress on the other (three?)
 modules being reviewed before starting a thread on it.

>>>
>>> BTW, even in the lexer spec I've found an issue. How does this parse:
>>>5.blah
>>> According to the spec (maximal munch technique), it should be FLOAT then 
>>> IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the 
>>> lastest is the correct behavior, so you can write stuff like 123.init, but 
>>> that should be clarified.
>>
>> "1..2", "1.ident" and a float literal with '_' after the '.' are the
>> DecimalFloat cases that I immediately ran into when doing a lexer based on
>> the dlang grammar. It's obvious to a human how these should be handled, but
>> code generators aren't that smart... But they are good at catching mistakes
>> like these.
> 
> The "1..2" is actually mentioned in the spec:
> "An exception to this rule is that a .. embedded inside what looks like two 
> floating point literals, as in 1..2, is interpreted as if the .. was 
> separated by a space from the first integer."
> so it's there, even if it can be missed.

I know, but documenting a (grammar) bug does not make it go away.

> But unless I missed it, the spec is incorrect for the "1.ident" or 
> "1_2_3_4_5_6_._5_6_7_8" cases as there is no exception mentioned there... and 
> it's not always 100% obvious to a human how these should be handled. Or maybe 
> that's just me :)

What does the "spec" currently say about ".001"?..

It's been a while since I did a d lexer, based on the dlang grammar - it 
(the lexer) was supposed to be dmd compatible. Took a closer look at the
actual dlang.org rules today while writing this message...


Will try to find some time to clean up and convert a working D lexical
grammar to PEG; what i have should be 1:1 translatable, except one rule
(DelimitedString) and put it on the wiki. Maybe it will help someone
avoid these issues.

artur


Re: Official D Grammar

2013-04-08 Thread Christopher Bergqvist

On Tuesday, 2 April 2013 at 19:00:21 UTC, Tobias Pankrath wrote:


I'm wondering if it's possibly to mechanically check that 
what's in the grammar is how DMD behaves.


Take the grammar and (randomly) generate strings with it and 
check if DMD does complain. You'd need a parse only don't check 
semantics flag, though.


This will not check if the strings are parsed correctly by DMD 
nor if invalid strings are rejected. But it would be a start.


An alternative idea for ensuring that documentation and 
implementation are in sync might be to list the full grammar 
definition as a data structure that can both be used as input for 
the parser and as input for a tool that generates the 
documentation.  Theoretically possible, :) just look at Philippe 
Sigaud's Pegged.


Re: Official D Grammar

2013-04-08 Thread Tobias Pankrath
On Monday, 8 April 2013 at 21:50:12 UTC, Christopher Bergqvist 
wrote:

On Tuesday, 2 April 2013 at 19:00:21 UTC, Tobias Pankrath wrote:


I'm wondering if it's possibly to mechanically check that 
what's in the grammar is how DMD behaves.


Take the grammar and (randomly) generate strings with it and 
check if DMD does complain. You'd need a parse only don't 
check semantics flag, though.


This will not check if the strings are parsed correctly by DMD 
nor if invalid strings are rejected. But it would be a start.


An alternative idea for ensuring that documentation and 
implementation are in sync might be to list the full grammar 
definition as a data structure that can both be used as input 
for the parser and as input for a tool that generates the 
documentation.  Theoretically possible, :) just look at 
Philippe Sigaud's Pegged.


I know but the parser is currently hand written and I think 
Walter will only accept an auto generated parser if it is as fast 
as the current solution.


However in an old discussion someone said that the D grammar 
isn't LALR(1) or LR(1), so I don't think that is possible with 
current D parser generators. Do we have a pegged grammar for D?


Another think is that for documentation purposes you might want 
to have an more readable but ambiguous grammar than your 
generator of choice will accept.





Re: Official D Grammar

2013-04-08 Thread Philippe Sigaud
> However in an old discussion someone said that the D grammar isn't LALR(1)
> or LR(1), so I don't think that is possible with current D parser
> generators. Do we have a pegged grammar for D?
>

Yes, it comes with the project. But, it's still buggy (sometimes due to my
own mistakes, sometimes due to plain errors in the online D grammar). And
the generated parser is quite slow, halas.


Re: Official D Grammar

2013-04-09 Thread Bruno Medeiros

On 07/04/2013 16:14, Artur Skawina wrote:

The "1..2" is actually mentioned in the spec:
>"An exception to this rule is that a .. embedded inside what looks like two 
floating point literals, as in 1..2, is interpreted as if the .. was separated by a space 
from the first integer."
>so it's there, even if it can be missed.

I know, but documenting a (grammar) bug does not make it go away.



Who says its a bug? From my understanding, this exception is there on 
purpose, to make it easier to use the DOT_DOT operator in the slice 
expresions:


  foo[1..2] // It would be silly to have to put a space after the 1

At most you could make a case that "1." shouldn't ever parse as float, 
that the decimal part should be required if the dot is present.


--
Bruno Medeiros - Software Engineer


Re: Official D Grammar

2013-04-09 Thread Bruno Medeiros

On 02/04/2013 03:13, Walter Bright wrote:


1) Grammar defined in terms of things that aren't tokens. Take, for
example,
PropertyDeclaration. It's defined as an "@" token followed by... what?
"safe"?
It's not a real token. It's an identifier. You can't parse this based on
checking the token type. You have to check the type and the value.


True, do you have a suggestion?


I don't think that kind of grammar issue is too annoying, since it's 
easy to understand what the intended behavior is (in this case at least).
But to fix it, well, we can have just that: have the grammar say it 
should parse an identifier after the @, and then issue a semantic error 
of sorts if the value is not one of the expected special values (safe, 
etc.).
Parsing an identifier here is in any case the best error recovery 
strategy anyways.


A bit more annoying is the case with the extern declaration, with the 
C++ parameter:

  extern(C++)
here you have to look at a special identifier (the C, D, PASCAL part) 
and see if there is a ++ token ahead, it's a bit more of special-casing 
in the parser. Here I think it would have been better to change the the 
language itself and use "CPP" instead of "C++". A minor simplification.


--
Bruno Medeiros - Software Engineer


Re: Official D Grammar

2013-04-09 Thread Artur Skawina
On 04/09/13 12:24, Bruno Medeiros wrote:
> On 07/04/2013 16:14, Artur Skawina wrote:
>>> The "1..2" is actually mentioned in the spec:
>>> >"An exception to this rule is that a .. embedded inside what looks like 
>>> >two floating point literals, as in 1..2, is interpreted as if the .. was 
>>> >separated by a space from the first integer."
>>> >so it's there, even if it can be missed.
>> I know, but documenting a (grammar) bug does not make it go away.
>>
> 
> Who says its a bug? From my understanding, this exception is there on 
> purpose, to make it easier to use the DOT_DOT operator in the slice 
> expresions:
> 
>   foo[1..2] // It would be silly to have to put a space after the 1
> 
> At most you could make a case that "1." shouldn't ever parse as float, that 
> the decimal part should be required if the dot is present.

It's a bug, because the grammar does not correctly describe the rules. A
parser/lexer based on the grammar alone will not work. Documenting the
exception(s) helps the human, but doesn't make the grammar correct.

I've started the PEG conversion of my lexer rules and the relevant one
looks like this:

   DecimalFloat:
 (LeadingDecimal "." !"." !IdentifierStart DecimalDigitsNoSingleUS 
DecimalExponent)
   / (LeadingDecimal "." !"." !IdentifierStart DigitUS*)
   / ("." LeadingDecimal DecimalExponent?)

This works as-is, w/o any extra info - the working lexer is mechanically
generated from aot this rule.
(It differs from the dlang definition in at least four ways - "1..2",
"1.ident", "1.000_1" and ".001")

I'll try to finish the conversion and post the whole lexical grammar in
a couple days (have almost no D-time right now). 

artur


Re: Official D Grammar

2013-04-09 Thread Jacob Carlborg

On 2013-04-02 04:13, Walter Bright wrote:


1) Grammar defined in terms of things that aren't tokens. Take, for
example,
PropertyDeclaration. It's defined as an "@" token followed by... what?
"safe"?
It's not a real token. It's an identifier. You can't parse this based on
checking the token type. You have to check the type and the value.


True, do you have a suggestion?


Just define that @safe should be a token?

--
/Jacob Carlborg


Re: Official D Grammar

2013-04-20 Thread Brian Schott
I've moved my work on the grammar to the following location on 
Github:


https://github.com/Hackerpilot/DGrammar

This uses ANTLR, as the other parser generators can't handle D's 
grammar. Several rules from the official grammar were removed, 
and several others were added (Such as an actual rule for a 
function declaration...) I also tried to fix any inacuracies or 
omissions I came across in the online documentation.


Comments, issues, and pull requests welcome.


Re: Official D Grammar

2013-04-20 Thread Dmitry Olshansky

20-Apr-2013 12:31, Brian Schott пишет:

I've moved my work on the grammar to the following location on Github:

https://github.com/Hackerpilot/DGrammar

This uses ANTLR, as the other parser generators can't handle D's
grammar.


Great. IMHO ANTLR is one of the sanest.

> Several rules from the official grammar were removed, and

several others were added (Such as an actual rule for a function
declaration...) I also tried to fix any inacuracies or omissions I came
across in the online documentation.

Comments, issues, and pull requests welcome.


Bookmarked for now ;)

--
Dmitry Olshansky


Re: Official D Grammar

2013-04-20 Thread Walter Bright

On 4/9/2013 3:20 AM, Bruno Medeiros wrote:

A bit more annoying is the case with the extern declaration, with the C++
parameter:
   extern(C++)
here you have to look at a special identifier (the C, D, PASCAL part) and see if
there is a ++ token ahead, it's a bit more of special-casing in the parser. Here
I think it would have been better to change the the language itself and use
"CPP" instead of "C++". A minor simplification.


I think the special case is worth it.



Re: Official D Grammar

2013-04-20 Thread Timon Gehr

On 04/20/2013 08:36 PM, Walter Bright wrote:

On 4/9/2013 3:20 AM, Bruno Medeiros wrote:

A bit more annoying is the case with the extern declaration, with the C++
parameter:
   extern(C++)
here you have to look at a special identifier (the C, D, PASCAL part)
and see if
there is a ++ token ahead, it's a bit more of special-casing in the
parser. Here
I think it would have been better to change the the language itself
and use
"CPP" instead of "C++". A minor simplification.


I think the special case is worth it.



Without doubt.


Re: Official D Grammar

2013-04-23 Thread Stewart Gordon

On 02/04/2013 03:13, Walter Bright wrote:

On 4/1/2013 4:18 PM, Brian Schott wrote:



1) Grammar defined in terms of things that aren't tokens. Take,
for example, PropertyDeclaration. It's defined as an "@" token
followed by... what? "safe"? It's not a real token. It's an
identifier. You can't parse this based on checking the token type.
You have to check the type and the value.


True, do you have a suggestion?



Er, do away with the meaningless @?  Since nobody seems to have 
succeeded in explaining how the @-attributes differ from the rest, it 
seems the right way to go.


Stewart.

--
My email address is valid but not my primary mailbox and not checked 
regularly.  Please keep replies on the 'group where everybody may benefit.


Re: Official D Grammar

2013-04-23 Thread Diggory
If at some point custom attributes will be allowed, @ would be a 
nice syntax for that.


Things like @safe could be attributes implemented by the standard 
library - would cut down on the number of keywords.


Re: Official D Grammar

2013-04-23 Thread Jonathan M Davis
On Tuesday, April 23, 2013 21:51:54 Stewart Gordon wrote:
> Er, do away with the meaningless @? Since nobody seems to have
> succeeded in explaining how the @-attributes differ from the rest, it
> seems the right way to go.

That would mean creating more keywords, which would break code. By using @, we 
avoid having to create new keywords, which I believe was the whole point in 
the first place. Which attributes got @ on them was fairly arbitrary, but they 
do definitely serve a purpose. And any new attributes in the future will 
probably have @ on them for the same reason.

- Jonathan M Davis


Re: Official D Grammar

2013-04-23 Thread Jonathan M Davis
On Wednesday, April 24, 2013 00:34:54 Diggory wrote:
> If at some point custom attributes will be allowed, @ would be a
> nice syntax for that.
> 
> Things like @safe could be attributes implemented by the standard
> library - would cut down on the number of keywords.

We already have custom attributes using @. They're called User Defined 
Attributes and were added a couple releases ago IIRC.

- Jonathan M Davis


Re: Official D Grammar

2013-04-23 Thread Jacob Carlborg

On 2013-04-24 03:40, Jonathan M Davis wrote:


That would mean creating more keywords, which would break code. By using @, we
avoid having to create new keywords, which I believe was the whole point in
the first place. Which attributes got @ on them was fairly arbitrary, but they
do definitely serve a purpose. And any new attributes in the future will
probably have @ on them for the same reason.


You can't use the same reasoning anymore since we have UDA. Those will 
conflict with new keywords starting with an @.


--
/Jacob Carlborg


Re: Official D Grammar

2013-04-24 Thread Paulo Pinto
On Wednesday, 24 April 2013 at 01:40:52 UTC, Jonathan M Davis 
wrote:

On Tuesday, April 23, 2013 21:51:54 Stewart Gordon wrote:

Er, do away with the meaningless @? Since nobody seems to have
succeeded in explaining how the @-attributes differ from the 
rest, it

seems the right way to go.


That would mean creating more keywords, which would break code. 
By using @, we
avoid having to create new keywords, which I believe was the 
whole point in
the first place. Which attributes got @ on them was fairly 
arbitrary, but they
do definitely serve a purpose. And any new attributes in the 
future will

probably have @ on them for the same reason.

- Jonathan M Davis


Please don't.

This was the approach taken by Java with the @overload and 
friends, which leads to ugly code in my opinion.


Once I worked in a project which took a similar approach, we 
would have annotations as a way to extend the core grammar. Each 
team eventually grew its own set of annotations, to the point the 
amount of annotations became bigger than the core grammar itself, 
with overlapping meaning.


--
Paulo


Re: Official D Grammar

2013-04-25 Thread Jonathan M Davis
On Wednesday, April 24, 2013 08:39:21 Jacob Carlborg wrote:
> On 2013-04-24 03:40, Jonathan M Davis wrote:
> > That would mean creating more keywords, which would break code. By using
> > @, we avoid having to create new keywords, which I believe was the whole
> > point in the first place. Which attributes got @ on them was fairly
> > arbitrary, but they do definitely serve a purpose. And any new attributes
> > in the future will probably have @ on them for the same reason.
> 
> You can't use the same reasoning anymore since we have UDA. Those will
> conflict with new keywords starting with an @.

True. Adding @blah now risks conflicts where it didn't before, but it still 
risks fewer conflicts that adding a keyword, since a keyword could be used 
anywhere, so adding blah as a keyword would conflict with all uses of blah - 
including @blah - whereas adding @blah would just conflict with @blah. There 
was also some discussion on how to make it so that @blah might not cause 
conflicts (via namespacing or somesuch), but I don't recall if anything came of 
that or how much of applies to the actual UDA implementation.

So, it makes more sense to go with adding an @ attribute than a keyword if we 
have to add a new attribute, because it'll conflict with less, but it's 
definitely true that we no longer have a way to avoid creating conflicts when 
adding attributes without altering the grammar in a manner similar to when we 
made it possible to have attributes starting with @. So, it's that much more 
important that we only add new attributes when we really need them.

- Jonathan M Davis


Re: Official D Grammar

2013-04-26 Thread Brian Schott

On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:
This uses ANTLR, as the other parser generators can't handle 
D's grammar.


I'm beginning to think that ANTRL is not up to the task either. 
I've somehow managed to get the grammar to the point where it 
correctly parses several phobos modules but takes a half hour to 
do so.


Re: Official D Grammar

2013-04-28 Thread Brian Schott

On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:
I'm beginning to think that ANTRL is not up to the task either. 
I've somehow managed to get the grammar to the point where it 
correctly parses several phobos modules but takes a half hour 
to do so.


I think I'm done with this grammar for now. If there's any 
interest, I can use it to create syntax diagrams for inclusion on 
dlang.org.


I'll also be filing a series of bug reports over the next few 
days. (Two recent examples: 0.0L is not valid according to the 
lexical standard, and deprecated("string") is not documented 
anywhere)


Re: Official D Grammar

2013-04-28 Thread Andrei Alexandrescu

On 4/28/13 8:16 PM, Brian Schott wrote:

On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:

I'm beginning to think that ANTRL is not up to the task either. I've
somehow managed to get the grammar to the point where it correctly
parses several phobos modules but takes a half hour to do so.


I think I'm done with this grammar for now. If there's any interest, I
can use it to create syntax diagrams for inclusion on dlang.org.


Yes! This is awesome. Thanks for this work.


I'll also be filing a series of bug reports over the next few days. (Two
recent examples: 0.0L is not valid according to the lexical standard,
and deprecated("string") is not documented anywhere)


Great.


Andrei


Re: Official D Grammar

2013-05-01 Thread Tobias Pankrath

On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:

On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:
This uses ANTLR, as the other parser generators can't handle 
D's grammar.


I'm beginning to think that ANTRL is not up to the task either. 
I've somehow managed to get the grammar to the point where it 
correctly parses several phobos modules but takes a half hour 
to do so.


Do you know, which parts of the grammar make the parser so slow?
Do you use any suspicious features of ANTLR?


Re: Official D Grammar

2013-05-01 Thread Russel Winder
On Wed, 2013-05-01 at 17:34 +0200, Tobias Pankrath wrote:
> On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:
> > On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:
> >> This uses ANTLR, as the other parser generators can't handle 
> >> D's grammar.

Which other parser generators? cf. for example,
http://en.wikipedia.org/wiki/Comparison_of_parser_generators

> > I'm beginning to think that ANTRL is not up to the task either. 
> > I've somehow managed to get the grammar to the point where it 
> > correctly parses several phobos modules but takes a half hour 
> > to do so.

LL(k) can sometimes do this. As can LALR(1) of course.

> Do you know, which parts of the grammar make the parser so slow?
> Do you use any suspicious features of ANTLR?

There isn't just one ANTLR: ANTLR 2, ANTLR 3, and ANTLR 4 are very
different beasties.

Not many of the parser generators generate D code :-(

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part


Re: Official D Grammar

2013-05-02 Thread Bruno Medeiros

On 26/04/2013 21:44, Brian Schott wrote:

On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:

This uses ANTLR, as the other parser generators can't handle D's grammar.


I'm beginning to think that ANTRL is not up to the task either. I've
somehow managed to get the grammar to the point where it correctly
parses several phobos modules but takes a half hour to do so.


Half hour??? That seems like a massive amount of time, even if the 
generated parser is very suboptimal and is doing a lot of unnecessary 
backtracking. I mean, even if the parser has quadratic performance over 
typical usage source files, it seems like a lot.


To be honest, that's one of the reasons that put me off working with 
ANLTR. It seems easy to create a parser with ANTLR, but to create an 
efficient, well-behaved parser it looks quite complicated, in the sense 
that you can't abstract yourself from what is happening under the 
hood... you have to read a lot of theory and documention to learn the 
innards of ANTLR, and understand what kind of code it's actually 
generating, and how it processes input. (at moments it feels like you 
have to take a degree to learn how to use it effectively...)


--
Bruno Medeiros - Software Engineer


Re: Official D Grammar

2013-05-02 Thread Russel Winder
On Thu, 2013-05-02 at 17:44 +0100, Bruno Medeiros wrote:
[…]
> To be honest, that's one of the reasons that put me off working with 
> ANLTR. It seems easy to create a parser with ANTLR, but to create an 
> efficient, well-behaved parser it looks quite complicated, in the sense 
> that you can't abstract yourself from what is happening under the 
> hood... you have to read a lot of theory and documention to learn the 
> innards of ANTLR, and understand what kind of code it's actually 
> generating, and how it processes input. (at moments it feels like you 
> have to take a degree to learn how to use it effectively...)

The Groovy parser is an ANTLR 2.7.7 grammar, it works well and quickly.
I think the trick is to work with the LL(k) idioms and avoid letting
LALR(1) thoughts creep in.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part


Re: Official D Grammar

2013-05-02 Thread Brian Schott

On Thursday, 2 May 2013 at 16:44:44 UTC, Bruno Medeiros wrote:
(at moments it feels like you have to take a degree to learn 
how to use it effectively...)


That or buy the documentation. Or both...


Re: Official D Grammar

2013-05-03 Thread Tobias Pankrath

On Thursday, 2 May 2013 at 17:13:46 UTC, Russel Winder wrote:

On Thu, 2013-05-02 at 17:44 +0100, Bruno Medeiros wrote:
[…]
To be honest, that's one of the reasons that put me off 
working with ANLTR. It seems easy to create a parser with 
ANTLR, but to create an efficient, well-behaved parser it 
looks quite complicated, in the sense that you can't abstract 
yourself from what is happening under the hood... you have to 
read a lot of theory and documention to learn the innards of 
ANTLR, and understand what kind of code it's actually 
generating, and how it processes input. (at moments it feels 
like you have to take a degree to learn how to use it 
effectively...)


The Groovy parser is an ANTLR 2.7.7 grammar, it works well and 
quickly.
I think the trick is to work with the LL(k) idioms and avoid 
letting

LALR(1) thoughts creep in.


LALR(1) is O(n) last time I looked. Both Ll(k) and LALR(k) should 
never backtrack. Dunno what Antlr is doing.


Re: Official D Grammar

2013-05-12 Thread Brian Schott
I wrote a small program to take the grammar and generate input to 
the dot program from Graphviz. You can generate the diagrams with 
some scripts available on Github or view a prebuilt version here:


http://hackerpilot.org/d/dgrammar.zip

Please take a look at this and let me know if you find any 
mistakes.


Re: Official D Grammar

2013-06-03 Thread Brian Schott
As threatened at DConf, I've started filing bugs against the 
grammar specification. Anyone interested can track bug 10233[1], 
which I've marked as blocked by the various issues I've been 
finding.


As usual, my best guess at D's actual grammar[2] is located 
here[3].


Top secret project is hinted at here[4].

[1] http://d.puremagic.com/issues/show_bug.cgi?id=10233
[2] I disclaim all responsibility for injuries sustained while 
looking at the functionDefinition rule.

[3] https://github.com/Hackerpilot/DGrammar/blob/master/D.g4
[4] 
http://hackerpilot.github.io/experimental/std_lexer/phobos/parser.html




Re: Official D Grammar

2013-07-19 Thread Brian Schott

Status update:

My parser is at a point where it can parse all of phobos (well, 
excluding c-style array declarations, which I'm convinced are 
broken and should be removed from the language).


I've been updating the grammar definition as I worked on the 
parser. An HTML version is available for download here: 
https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html


Lexer, parser, and AST code is available here: 
https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. 
Bug reports and pull requests are welcome.


Re: Official D Grammar

2013-07-19 Thread Brian Schott

On Friday, 19 July 2013 at 20:57:14 UTC, H. S. Teoh wrote:
Really? Those still exist in Phobos? Shouldn't they be 
rewritten to the

"native" style?


Yes.

https://github.com/D-Programming-Language/phobos/pull/1412/files


Re: Official D Grammar

2013-07-19 Thread Brian Schott

On Friday, 19 July 2013 at 20:52:33 UTC, Mr. Anonymous wrote:

https://rawgithub.com/Hackerpilot/DGrammar/master/grammar.html
;)


I learn something new every day.



Re: Official D Grammar

2013-07-19 Thread Mr. Anonymous

On Friday, 19 July 2013 at 20:49:27 UTC, Brian Schott wrote:

Status update:

My parser is at a point where it can parse all of phobos (well, 
excluding c-style array declarations, which I'm convinced are 
broken and should be removed from the language).


I've been updating the grammar definition as I worked on the 
parser. An HTML version is available for download here: 
https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html


Lexer, parser, and AST code is available here: 
https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. 
Bug reports and pull requests are welcome.


https://rawgithub.com/Hackerpilot/DGrammar/master/grammar.html
;)


Re: Official D Grammar

2013-07-19 Thread H. S. Teoh
On Fri, Jul 19, 2013 at 10:49:25PM +0200, Brian Schott wrote:
> Status update:
> 
> My parser is at a point where it can parse all of phobos (well,
> excluding c-style array declarations, which I'm convinced are broken
> and should be removed from the language).

Really? Those still exist in Phobos? Shouldn't they be rewritten to the
"native" style?


> I've been updating the grammar definition as I worked on the parser.
> An HTML version is available for download here:
> https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html
> 
> Lexer, parser, and AST code is available here:
> https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d.
> Bug reports and pull requests are welcome.

Wonderful! I'll have to look into this sometime. I've been dreaming
about writing a D pretty-printer, and this may be just the thing I need
to get it going.  :)

(Well actually, I *wrote* a D pretty-printer, but I only got as far as
lexing, with some token-matching rules for the pretty-printing. Needless
to say, the results were rather underwhelming, since I didn't have an
AST to work with.)


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.


Re: Official D Grammar

2013-07-19 Thread Johannes Pfau
Am Fri, 19 Jul 2013 22:49:25 +0200
schrieb "Brian Schott" :

> Status update:
> 
> My parser is at a point where it can parse all of phobos (well, 
> excluding c-style array declarations, which I'm convinced are 
> broken and should be removed from the language).
> 
> I've been updating the grammar definition as I worked on the 
> parser. An HTML version is available for download here: 
> https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html
> 
> Lexer, parser, and AST code is available here: 
> https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. 
> Bug reports and pull requests are welcome.

Nice work!

I guess you already tried druntime but did you also run your lexer and
or parser against the dmd test suite? The test suite probably has quite
some old legacy stuff like c-style array declarations so it might be a
pita to run those tests but it might show some real problems as well.

If you wanted to do more testing you could also run it on some
well-maintained big D projects (GTKd, derelict3, ...)
http://wiki.dlang.org/Libraries_and_Frameworks
http://wiki.dlang.org/Open_Source_Projects

It's also nice that we may finally get a correct and complete grammar.

BTW: I think it might be a good idea to start a new thread for this
update. This stuff is quite important for D/phobos and usually old
threads tend to be ignored or get less attention.


Re: Official D Grammar

2013-07-20 Thread Jacob Carlborg

On 2013-07-19 22:49, Brian Schott wrote:

Status update:

My parser is at a point where it can parse all of phobos (well,
excluding c-style array declarations, which I'm convinced are broken and
should be removed from the language).

I've been updating the grammar definition as I worked on the parser. An
HTML version is available for download here:
https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html

Lexer, parser, and AST code is available here:
https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d.
Bug reports and pull requests are welcome.


Now for the million dollar question: does it work during compile time?

--
/Jacob Carlborg


Re: Official D Grammar

2013-07-20 Thread Michael

On Friday, 19 July 2013 at 20:49:27 UTC, Brian Schott wrote:

Status update:
excluding c-style array declarations, which I'm convinced are 
broken and should be removed from the language

+1

I found for myself that "native" style arrays are more natural.