Official D Grammar
I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it. In the meantime I've started some work on an AST module for Phobos that contains the data types necessary to build up a parser module so that we can have a standard set of code build D dev tools off of. I decided to work directly from the standard on dlang.org for this to make sure that my module is correct and that the standard is actually correct. I've seen several threads on this newsgroup complaining about the state of the standard and unfortunately this will be another one. 1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value. 2) Grammar references rules that don't exist. UserDefinedAttribute is defined in terms of CallExpression, but CallExpression doesn't exist elsewhere in the grammar. BaseInterfaceList is defined in terms of InterfaceClasses, but that rule is never defined. 3) Unnecessary rules. KeyExpression, ValueExpression, ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement, Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp, LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all defined in terms of only one other rule. I think that we need to be able to create a grammar description that: * Fits in to a single file, so that a tool implementer does not need to collect bits of the grammar from the various pages on dlang.org. * Can be verified to be correct by an existing tool such as Bison, Goldie, JavaCC, with a small number of changes. * Is part of the dmd/dlang repositories on github and gets updated every time the language changes. I'm willing to work on this if there's a good chance it will actually be implemented. Thoughts?
Re: Official D Grammar
On 4/1/2013 4:18 PM, Brian Schott wrote: I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it. In the meantime I've started some work on an AST module for Phobos that contains the data types necessary to build up a parser module so that we can have a standard set of code build D dev tools off of. I decided to work directly from the standard on dlang.org for this to make sure that my module is correct and that the standard is actually correct. I've seen several threads on this newsgroup complaining about the state of the standard and unfortunately this will be another one. 1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value. True, do you have a suggestion? 2) Grammar references rules that don't exist. UserDefinedAttribute is defined in terms of CallExpression, but CallExpression doesn't exist elsewhere in the grammar. BaseInterfaceList is defined in terms of InterfaceClasses, but that rule is never defined. Yes, this needs to be fixed. 3) Unnecessary rules. KeyExpression, ValueExpression, ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement, Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp, LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all defined in terms of only one other rule. Using these makes documentation easier, and I don't think it harms anything. I think that we need to be able to create a grammar description that: * Fits in to a single file, so that a tool implementer does not need to collect bits of the grammar from the various pages on dlang.org. * Can be verified to be correct by an existing tool such as Bison, Goldie, JavaCC, with a small number of changes. * Is part of the dmd/dlang repositories on github and gets updated every time the language changes. I'm willing to work on this if there's a good chance it will actually be implemented. Thoughts? I suggest doing this as a sequence of pull requests, not doing just one big one.
Re: Official D Grammar
On 02/04/2013 00:18, Brian Schott wrote: I think that we need to be able to create a grammar description that: * Fits in to a single file, so that a tool implementer does not need to collect bits of the grammar from the various pages on dlang.org. * Can be verified to be correct by an existing tool such as Bison, Goldie, JavaCC, with a small number of changes. * Is part of the dmd/dlang repositories on github and gets updated every time the language changes. Indeed, the published grammar needs to be thoroughly checked against what DMD is actually doing, and any discrepancies fixed (or filed in Bugzilla to be fixed in due course). And then they need to be kept in sync. Has the idea of using a parser generator to build D's parsing code been rejected in the past, or is hand-coding just the way Walter decided to do it? Is the code any more efficient than what a typical parser generator would generate? And all disambiguation rules (such as "if it's parseable as a DeclarationStatement, it's a DeclarationStatement") need to be made explicit as part of the grammar. I suppose this is where using Bison or similar would help, as it would point out any ambiguities in the grammar that need rules to resolve them. Stewart.
Re: Official D Grammar
On 2013-04-02 15:21, Stewart Gordon wrote: Indeed, the published grammar needs to be thoroughly checked against what DMD is actually doing, and any discrepancies fixed (or filed in Bugzilla to be fixed in due course). And then they need to be kept in sync. Has the idea of using a parser generator to build D's parsing code been rejected in the past, or is hand-coding just the way Walter decided to do it? Is the code any more efficient than what a typical parser generator would generate? And all disambiguation rules (such as "if it's parseable as a DeclarationStatement, it's a DeclarationStatement") need to be made explicit as part of the grammar. I suppose this is where using Bison or similar would help, as it would point out any ambiguities in the grammar that need rules to resolve them. I'm wondering if it's possibly to mechanically check that what's in the grammar is how DMD behaves. -- /Jacob Carlborg
Re: Official D Grammar
I'm wondering if it's possibly to mechanically check that what's in the grammar is how DMD behaves. Take the grammar and (randomly) generate strings with it and check if DMD does complain. You'd need a parse only don't check semantics flag, though. This will not check if the strings are parsed correctly by DMD nor if invalid strings are rejected. But it would be a start.
Re: Official D Grammar
On 02/04/2013 00:18, Brian Schott wrote: I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it. In the meantime I've started some work on an AST module for Phobos that contains the data types necessary to build up a parser module so that we can have a standard set of code build D dev tools off of. I decided to work directly from the standard on dlang.org for this to make sure that my module is correct and that the standard is actually correct. I've seen several threads on this newsgroup complaining about the state of the standard and unfortunately this will be another one. 1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value. 2) Grammar references rules that don't exist. UserDefinedAttribute is defined in terms of CallExpression, but CallExpression doesn't exist elsewhere in the grammar. BaseInterfaceList is defined in terms of InterfaceClasses, but that rule is never defined. 3) Unnecessary rules. KeyExpression, ValueExpression, ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement, Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp, LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all defined in terms of only one other rule. I think that we need to be able to create a grammar description that: * Fits in to a single file, so that a tool implementer does not need to collect bits of the grammar from the various pages on dlang.org. * Can be verified to be correct by an existing tool such as Bison, Goldie, JavaCC, with a small number of changes. * Is part of the dmd/dlang repositories on github and gets updated every time the language changes. I'm willing to work on this if there's a good chance it will actually be implemented. Thoughts? Interesting thread. I've been working on a hand-written D parser (in Java, for the DDT IDE) and I too have found a slew of grammar spec issues. Some of them more serious than the ones you mentioned above. In same cases it's actually not clear, or downright wrong what the grammar spec says. For example, here's one off of my notes: void func(int foo() { } ); The spec says that is parsable (basically a function declaration in the parameter list), which makes no sense, and DMD doesn't accept. Some cases are a bit trickier, since it's not clear if the syntax should be accepted or not (sometimes they might make sense but not be allowed). These issues make things a bit harder for tools development that require D language parsers. But the whole grammar spec is so messy, I've been unsure whether it's worth filling bug reports or not (would they be addressed?). There is also the problem that even if those issues are fixed now, the spec could very easily fall out of date in the future, unless we have some system to test the spec. Like you mentioned, ideally we would have a grammar spec for a grammar/PG tool so that correctness could more easily be verified. (it doesn't guarantee no spec bugs, but it makes it much harder for them to be there) -- Bruno Medeiros - Software Engineer
Re: Official D Grammar
On 02/04/2013 00:18, Brian Schott wrote: I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it. BTW, even in the lexer spec I've found an issue. How does this parse: 5.blah According to the spec (maximal munch technique), it should be FLOAT then IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the lastest is the correct behavior, so you can write stuff like 123.init, but that should be clarified. -- Bruno Medeiros - Software Engineer
Re: Official D Grammar
On Saturday, April 06, 2013 16:21:12 Bruno Medeiros wrote: > On 02/04/2013 00:18, Brian Schott wrote: > > I've pretty much finished up my work on the std.d.lexer module. I am > > waiting for the review queue to make some progress on the other (three?) > > modules being reviewed before starting a thread on it. > > BTW, even in the lexer spec I've found an issue. How does this parse: > 5.blah > According to the spec (maximal munch technique), it should be FLOAT then > IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming > the lastest is the correct behavior, so you can write stuff like > 123.init, but that should be clarified. It would definitely have to be INTEGER DOT IDENTIFIER due to UFCS, so it sounds like the spec wasn't updated like it should have been. - Jonathan M Davis
Re: Official D Grammar
On 04/06/13 17:21, Bruno Medeiros wrote: > On 02/04/2013 00:18, Brian Schott wrote: >> I've pretty much finished up my work on the std.d.lexer module. I am >> waiting for the review queue to make some progress on the other (three?) >> modules being reviewed before starting a thread on it. >> > > BTW, even in the lexer spec I've found an issue. How does this parse: > 5.blah > According to the spec (maximal munch technique), it should be FLOAT then > IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the > lastest is the correct behavior, so you can write stuff like 123.init, but > that should be clarified. "1..2", "1.ident" and a float literal with '_' after the '.' are the DecimalFloat cases that I immediately ran into when doing a lexer based on the dlang grammar. It's obvious to a human how these should be handled, but code generators aren't that smart... But they are good at catching mistakes like these. Actually, that last case is even more "interesting"; http://dlang.org/lex.html has "1_2_3_4_5_6_._5_6_7_8" as a valid example, which of course it's not ("_5_6_7_8" is a valid identifier), but there is no reason do disallow "1_2_3_4_5_6_.5_6_7_8". > that should be clarified. These are just grammar bugs, that could easily be fixed. Then there are some things that can be less obvious, but shouldn't really be controversial like allowing empty HexString literals. Then there's the enhancement category. Looking through my comments, I think the only deliberate change from dlang.org that I have is in DelimitedString -- there is no reason to forbid q"/abc/def/"; there are no back-compat issues, as it couldn't have existed in legacy D code. artur
Re: Official D Grammar
On 06/04/2013 20:52, Artur Skawina wrote: On 04/06/13 17:21, Bruno Medeiros wrote: On 02/04/2013 00:18, Brian Schott wrote: I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it. BTW, even in the lexer spec I've found an issue. How does this parse: 5.blah According to the spec (maximal munch technique), it should be FLOAT then IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the lastest is the correct behavior, so you can write stuff like 123.init, but that should be clarified. "1..2", "1.ident" and a float literal with '_' after the '.' are the DecimalFloat cases that I immediately ran into when doing a lexer based on the dlang grammar. It's obvious to a human how these should be handled, but code generators aren't that smart... But they are good at catching mistakes like these. The "1..2" is actually mentioned in the spec: "An exception to this rule is that a .. embedded inside what looks like two floating point literals, as in 1..2, is interpreted as if the .. was separated by a space from the first integer." so it's there, even if it can be missed. But unless I missed it, the spec is incorrect for the "1.ident" or "1_2_3_4_5_6_._5_6_7_8" cases as there is no exception mentioned there... and it's not always 100% obvious to a human how these should be handled. Or maybe that's just me :) -- Bruno Medeiros - Software Engineer
Re: Official D Grammar
On 04/07/13 00:35, Bruno Medeiros wrote: > On 06/04/2013 20:52, Artur Skawina wrote: >> On 04/06/13 17:21, Bruno Medeiros wrote: >>> On 02/04/2013 00:18, Brian Schott wrote: I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it. >>> >>> BTW, even in the lexer spec I've found an issue. How does this parse: >>>5.blah >>> According to the spec (maximal munch technique), it should be FLOAT then >>> IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the >>> lastest is the correct behavior, so you can write stuff like 123.init, but >>> that should be clarified. >> >> "1..2", "1.ident" and a float literal with '_' after the '.' are the >> DecimalFloat cases that I immediately ran into when doing a lexer based on >> the dlang grammar. It's obvious to a human how these should be handled, but >> code generators aren't that smart... But they are good at catching mistakes >> like these. > > The "1..2" is actually mentioned in the spec: > "An exception to this rule is that a .. embedded inside what looks like two > floating point literals, as in 1..2, is interpreted as if the .. was > separated by a space from the first integer." > so it's there, even if it can be missed. I know, but documenting a (grammar) bug does not make it go away. > But unless I missed it, the spec is incorrect for the "1.ident" or > "1_2_3_4_5_6_._5_6_7_8" cases as there is no exception mentioned there... and > it's not always 100% obvious to a human how these should be handled. Or maybe > that's just me :) What does the "spec" currently say about ".001"?.. It's been a while since I did a d lexer, based on the dlang grammar - it (the lexer) was supposed to be dmd compatible. Took a closer look at the actual dlang.org rules today while writing this message... Will try to find some time to clean up and convert a working D lexical grammar to PEG; what i have should be 1:1 translatable, except one rule (DelimitedString) and put it on the wiki. Maybe it will help someone avoid these issues. artur
Re: Official D Grammar
On Tuesday, 2 April 2013 at 19:00:21 UTC, Tobias Pankrath wrote: I'm wondering if it's possibly to mechanically check that what's in the grammar is how DMD behaves. Take the grammar and (randomly) generate strings with it and check if DMD does complain. You'd need a parse only don't check semantics flag, though. This will not check if the strings are parsed correctly by DMD nor if invalid strings are rejected. But it would be a start. An alternative idea for ensuring that documentation and implementation are in sync might be to list the full grammar definition as a data structure that can both be used as input for the parser and as input for a tool that generates the documentation. Theoretically possible, :) just look at Philippe Sigaud's Pegged.
Re: Official D Grammar
On Monday, 8 April 2013 at 21:50:12 UTC, Christopher Bergqvist wrote: On Tuesday, 2 April 2013 at 19:00:21 UTC, Tobias Pankrath wrote: I'm wondering if it's possibly to mechanically check that what's in the grammar is how DMD behaves. Take the grammar and (randomly) generate strings with it and check if DMD does complain. You'd need a parse only don't check semantics flag, though. This will not check if the strings are parsed correctly by DMD nor if invalid strings are rejected. But it would be a start. An alternative idea for ensuring that documentation and implementation are in sync might be to list the full grammar definition as a data structure that can both be used as input for the parser and as input for a tool that generates the documentation. Theoretically possible, :) just look at Philippe Sigaud's Pegged. I know but the parser is currently hand written and I think Walter will only accept an auto generated parser if it is as fast as the current solution. However in an old discussion someone said that the D grammar isn't LALR(1) or LR(1), so I don't think that is possible with current D parser generators. Do we have a pegged grammar for D? Another think is that for documentation purposes you might want to have an more readable but ambiguous grammar than your generator of choice will accept.
Re: Official D Grammar
> However in an old discussion someone said that the D grammar isn't LALR(1) > or LR(1), so I don't think that is possible with current D parser > generators. Do we have a pegged grammar for D? > Yes, it comes with the project. But, it's still buggy (sometimes due to my own mistakes, sometimes due to plain errors in the online D grammar). And the generated parser is quite slow, halas.
Re: Official D Grammar
On 07/04/2013 16:14, Artur Skawina wrote: The "1..2" is actually mentioned in the spec: >"An exception to this rule is that a .. embedded inside what looks like two floating point literals, as in 1..2, is interpreted as if the .. was separated by a space from the first integer." >so it's there, even if it can be missed. I know, but documenting a (grammar) bug does not make it go away. Who says its a bug? From my understanding, this exception is there on purpose, to make it easier to use the DOT_DOT operator in the slice expresions: foo[1..2] // It would be silly to have to put a space after the 1 At most you could make a case that "1." shouldn't ever parse as float, that the decimal part should be required if the dot is present. -- Bruno Medeiros - Software Engineer
Re: Official D Grammar
On 02/04/2013 03:13, Walter Bright wrote: 1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value. True, do you have a suggestion? I don't think that kind of grammar issue is too annoying, since it's easy to understand what the intended behavior is (in this case at least). But to fix it, well, we can have just that: have the grammar say it should parse an identifier after the @, and then issue a semantic error of sorts if the value is not one of the expected special values (safe, etc.). Parsing an identifier here is in any case the best error recovery strategy anyways. A bit more annoying is the case with the extern declaration, with the C++ parameter: extern(C++) here you have to look at a special identifier (the C, D, PASCAL part) and see if there is a ++ token ahead, it's a bit more of special-casing in the parser. Here I think it would have been better to change the the language itself and use "CPP" instead of "C++". A minor simplification. -- Bruno Medeiros - Software Engineer
Re: Official D Grammar
On 04/09/13 12:24, Bruno Medeiros wrote: > On 07/04/2013 16:14, Artur Skawina wrote: >>> The "1..2" is actually mentioned in the spec: >>> >"An exception to this rule is that a .. embedded inside what looks like >>> >two floating point literals, as in 1..2, is interpreted as if the .. was >>> >separated by a space from the first integer." >>> >so it's there, even if it can be missed. >> I know, but documenting a (grammar) bug does not make it go away. >> > > Who says its a bug? From my understanding, this exception is there on > purpose, to make it easier to use the DOT_DOT operator in the slice > expresions: > > foo[1..2] // It would be silly to have to put a space after the 1 > > At most you could make a case that "1." shouldn't ever parse as float, that > the decimal part should be required if the dot is present. It's a bug, because the grammar does not correctly describe the rules. A parser/lexer based on the grammar alone will not work. Documenting the exception(s) helps the human, but doesn't make the grammar correct. I've started the PEG conversion of my lexer rules and the relevant one looks like this: DecimalFloat: (LeadingDecimal "." !"." !IdentifierStart DecimalDigitsNoSingleUS DecimalExponent) / (LeadingDecimal "." !"." !IdentifierStart DigitUS*) / ("." LeadingDecimal DecimalExponent?) This works as-is, w/o any extra info - the working lexer is mechanically generated from aot this rule. (It differs from the dlang definition in at least four ways - "1..2", "1.ident", "1.000_1" and ".001") I'll try to finish the conversion and post the whole lexical grammar in a couple days (have almost no D-time right now). artur
Re: Official D Grammar
On 2013-04-02 04:13, Walter Bright wrote: 1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value. True, do you have a suggestion? Just define that @safe should be a token? -- /Jacob Carlborg
Re: Official D Grammar
I've moved my work on the grammar to the following location on Github: https://github.com/Hackerpilot/DGrammar This uses ANTLR, as the other parser generators can't handle D's grammar. Several rules from the official grammar were removed, and several others were added (Such as an actual rule for a function declaration...) I also tried to fix any inacuracies or omissions I came across in the online documentation. Comments, issues, and pull requests welcome.
Re: Official D Grammar
20-Apr-2013 12:31, Brian Schott пишет: I've moved my work on the grammar to the following location on Github: https://github.com/Hackerpilot/DGrammar This uses ANTLR, as the other parser generators can't handle D's grammar. Great. IMHO ANTLR is one of the sanest. > Several rules from the official grammar were removed, and several others were added (Such as an actual rule for a function declaration...) I also tried to fix any inacuracies or omissions I came across in the online documentation. Comments, issues, and pull requests welcome. Bookmarked for now ;) -- Dmitry Olshansky
Re: Official D Grammar
On 4/9/2013 3:20 AM, Bruno Medeiros wrote: A bit more annoying is the case with the extern declaration, with the C++ parameter: extern(C++) here you have to look at a special identifier (the C, D, PASCAL part) and see if there is a ++ token ahead, it's a bit more of special-casing in the parser. Here I think it would have been better to change the the language itself and use "CPP" instead of "C++". A minor simplification. I think the special case is worth it.
Re: Official D Grammar
On 04/20/2013 08:36 PM, Walter Bright wrote: On 4/9/2013 3:20 AM, Bruno Medeiros wrote: A bit more annoying is the case with the extern declaration, with the C++ parameter: extern(C++) here you have to look at a special identifier (the C, D, PASCAL part) and see if there is a ++ token ahead, it's a bit more of special-casing in the parser. Here I think it would have been better to change the the language itself and use "CPP" instead of "C++". A minor simplification. I think the special case is worth it. Without doubt.
Re: Official D Grammar
On 02/04/2013 03:13, Walter Bright wrote: On 4/1/2013 4:18 PM, Brian Schott wrote: 1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value. True, do you have a suggestion? Er, do away with the meaningless @? Since nobody seems to have succeeded in explaining how the @-attributes differ from the rest, it seems the right way to go. Stewart. -- My email address is valid but not my primary mailbox and not checked regularly. Please keep replies on the 'group where everybody may benefit.
Re: Official D Grammar
If at some point custom attributes will be allowed, @ would be a nice syntax for that. Things like @safe could be attributes implemented by the standard library - would cut down on the number of keywords.
Re: Official D Grammar
On Tuesday, April 23, 2013 21:51:54 Stewart Gordon wrote: > Er, do away with the meaningless @? Since nobody seems to have > succeeded in explaining how the @-attributes differ from the rest, it > seems the right way to go. That would mean creating more keywords, which would break code. By using @, we avoid having to create new keywords, which I believe was the whole point in the first place. Which attributes got @ on them was fairly arbitrary, but they do definitely serve a purpose. And any new attributes in the future will probably have @ on them for the same reason. - Jonathan M Davis
Re: Official D Grammar
On Wednesday, April 24, 2013 00:34:54 Diggory wrote: > If at some point custom attributes will be allowed, @ would be a > nice syntax for that. > > Things like @safe could be attributes implemented by the standard > library - would cut down on the number of keywords. We already have custom attributes using @. They're called User Defined Attributes and were added a couple releases ago IIRC. - Jonathan M Davis
Re: Official D Grammar
On 2013-04-24 03:40, Jonathan M Davis wrote: That would mean creating more keywords, which would break code. By using @, we avoid having to create new keywords, which I believe was the whole point in the first place. Which attributes got @ on them was fairly arbitrary, but they do definitely serve a purpose. And any new attributes in the future will probably have @ on them for the same reason. You can't use the same reasoning anymore since we have UDA. Those will conflict with new keywords starting with an @. -- /Jacob Carlborg
Re: Official D Grammar
On Wednesday, 24 April 2013 at 01:40:52 UTC, Jonathan M Davis wrote: On Tuesday, April 23, 2013 21:51:54 Stewart Gordon wrote: Er, do away with the meaningless @? Since nobody seems to have succeeded in explaining how the @-attributes differ from the rest, it seems the right way to go. That would mean creating more keywords, which would break code. By using @, we avoid having to create new keywords, which I believe was the whole point in the first place. Which attributes got @ on them was fairly arbitrary, but they do definitely serve a purpose. And any new attributes in the future will probably have @ on them for the same reason. - Jonathan M Davis Please don't. This was the approach taken by Java with the @overload and friends, which leads to ugly code in my opinion. Once I worked in a project which took a similar approach, we would have annotations as a way to extend the core grammar. Each team eventually grew its own set of annotations, to the point the amount of annotations became bigger than the core grammar itself, with overlapping meaning. -- Paulo
Re: Official D Grammar
On Wednesday, April 24, 2013 08:39:21 Jacob Carlborg wrote: > On 2013-04-24 03:40, Jonathan M Davis wrote: > > That would mean creating more keywords, which would break code. By using > > @, we avoid having to create new keywords, which I believe was the whole > > point in the first place. Which attributes got @ on them was fairly > > arbitrary, but they do definitely serve a purpose. And any new attributes > > in the future will probably have @ on them for the same reason. > > You can't use the same reasoning anymore since we have UDA. Those will > conflict with new keywords starting with an @. True. Adding @blah now risks conflicts where it didn't before, but it still risks fewer conflicts that adding a keyword, since a keyword could be used anywhere, so adding blah as a keyword would conflict with all uses of blah - including @blah - whereas adding @blah would just conflict with @blah. There was also some discussion on how to make it so that @blah might not cause conflicts (via namespacing or somesuch), but I don't recall if anything came of that or how much of applies to the actual UDA implementation. So, it makes more sense to go with adding an @ attribute than a keyword if we have to add a new attribute, because it'll conflict with less, but it's definitely true that we no longer have a way to avoid creating conflicts when adding attributes without altering the grammar in a manner similar to when we made it possible to have attributes starting with @. So, it's that much more important that we only add new attributes when we really need them. - Jonathan M Davis
Re: Official D Grammar
On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote: This uses ANTLR, as the other parser generators can't handle D's grammar. I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so.
Re: Official D Grammar
On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote: I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so. I think I'm done with this grammar for now. If there's any interest, I can use it to create syntax diagrams for inclusion on dlang.org. I'll also be filing a series of bug reports over the next few days. (Two recent examples: 0.0L is not valid according to the lexical standard, and deprecated("string") is not documented anywhere)
Re: Official D Grammar
On 4/28/13 8:16 PM, Brian Schott wrote: On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote: I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so. I think I'm done with this grammar for now. If there's any interest, I can use it to create syntax diagrams for inclusion on dlang.org. Yes! This is awesome. Thanks for this work. I'll also be filing a series of bug reports over the next few days. (Two recent examples: 0.0L is not valid according to the lexical standard, and deprecated("string") is not documented anywhere) Great. Andrei
Re: Official D Grammar
On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote: On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote: This uses ANTLR, as the other parser generators can't handle D's grammar. I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so. Do you know, which parts of the grammar make the parser so slow? Do you use any suspicious features of ANTLR?
Re: Official D Grammar
On Wed, 2013-05-01 at 17:34 +0200, Tobias Pankrath wrote: > On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote: > > On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote: > >> This uses ANTLR, as the other parser generators can't handle > >> D's grammar. Which other parser generators? cf. for example, http://en.wikipedia.org/wiki/Comparison_of_parser_generators > > I'm beginning to think that ANTRL is not up to the task either. > > I've somehow managed to get the grammar to the point where it > > correctly parses several phobos modules but takes a half hour > > to do so. LL(k) can sometimes do this. As can LALR(1) of course. > Do you know, which parts of the grammar make the parser so slow? > Do you use any suspicious features of ANTLR? There isn't just one ANTLR: ANTLR 2, ANTLR 3, and ANTLR 4 are very different beasties. Not many of the parser generators generate D code :-( -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part
Re: Official D Grammar
On 26/04/2013 21:44, Brian Schott wrote: On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote: This uses ANTLR, as the other parser generators can't handle D's grammar. I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so. Half hour??? That seems like a massive amount of time, even if the generated parser is very suboptimal and is doing a lot of unnecessary backtracking. I mean, even if the parser has quadratic performance over typical usage source files, it seems like a lot. To be honest, that's one of the reasons that put me off working with ANLTR. It seems easy to create a parser with ANTLR, but to create an efficient, well-behaved parser it looks quite complicated, in the sense that you can't abstract yourself from what is happening under the hood... you have to read a lot of theory and documention to learn the innards of ANTLR, and understand what kind of code it's actually generating, and how it processes input. (at moments it feels like you have to take a degree to learn how to use it effectively...) -- Bruno Medeiros - Software Engineer
Re: Official D Grammar
On Thu, 2013-05-02 at 17:44 +0100, Bruno Medeiros wrote: […] > To be honest, that's one of the reasons that put me off working with > ANLTR. It seems easy to create a parser with ANTLR, but to create an > efficient, well-behaved parser it looks quite complicated, in the sense > that you can't abstract yourself from what is happening under the > hood... you have to read a lot of theory and documention to learn the > innards of ANTLR, and understand what kind of code it's actually > generating, and how it processes input. (at moments it feels like you > have to take a degree to learn how to use it effectively...) The Groovy parser is an ANTLR 2.7.7 grammar, it works well and quickly. I think the trick is to work with the LL(k) idioms and avoid letting LALR(1) thoughts creep in. -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part
Re: Official D Grammar
On Thursday, 2 May 2013 at 16:44:44 UTC, Bruno Medeiros wrote: (at moments it feels like you have to take a degree to learn how to use it effectively...) That or buy the documentation. Or both...
Re: Official D Grammar
On Thursday, 2 May 2013 at 17:13:46 UTC, Russel Winder wrote: On Thu, 2013-05-02 at 17:44 +0100, Bruno Medeiros wrote: […] To be honest, that's one of the reasons that put me off working with ANLTR. It seems easy to create a parser with ANTLR, but to create an efficient, well-behaved parser it looks quite complicated, in the sense that you can't abstract yourself from what is happening under the hood... you have to read a lot of theory and documention to learn the innards of ANTLR, and understand what kind of code it's actually generating, and how it processes input. (at moments it feels like you have to take a degree to learn how to use it effectively...) The Groovy parser is an ANTLR 2.7.7 grammar, it works well and quickly. I think the trick is to work with the LL(k) idioms and avoid letting LALR(1) thoughts creep in. LALR(1) is O(n) last time I looked. Both Ll(k) and LALR(k) should never backtrack. Dunno what Antlr is doing.
Re: Official D Grammar
I wrote a small program to take the grammar and generate input to the dot program from Graphviz. You can generate the diagrams with some scripts available on Github or view a prebuilt version here: http://hackerpilot.org/d/dgrammar.zip Please take a look at this and let me know if you find any mistakes.
Re: Official D Grammar
As threatened at DConf, I've started filing bugs against the grammar specification. Anyone interested can track bug 10233[1], which I've marked as blocked by the various issues I've been finding. As usual, my best guess at D's actual grammar[2] is located here[3]. Top secret project is hinted at here[4]. [1] http://d.puremagic.com/issues/show_bug.cgi?id=10233 [2] I disclaim all responsibility for injuries sustained while looking at the functionDefinition rule. [3] https://github.com/Hackerpilot/DGrammar/blob/master/D.g4 [4] http://hackerpilot.github.io/experimental/std_lexer/phobos/parser.html
Re: Official D Grammar
Status update: My parser is at a point where it can parse all of phobos (well, excluding c-style array declarations, which I'm convinced are broken and should be removed from the language). I've been updating the grammar definition as I worked on the parser. An HTML version is available for download here: https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html Lexer, parser, and AST code is available here: https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. Bug reports and pull requests are welcome.
Re: Official D Grammar
On Friday, 19 July 2013 at 20:57:14 UTC, H. S. Teoh wrote: Really? Those still exist in Phobos? Shouldn't they be rewritten to the "native" style? Yes. https://github.com/D-Programming-Language/phobos/pull/1412/files
Re: Official D Grammar
On Friday, 19 July 2013 at 20:52:33 UTC, Mr. Anonymous wrote: https://rawgithub.com/Hackerpilot/DGrammar/master/grammar.html ;) I learn something new every day.
Re: Official D Grammar
On Friday, 19 July 2013 at 20:49:27 UTC, Brian Schott wrote: Status update: My parser is at a point where it can parse all of phobos (well, excluding c-style array declarations, which I'm convinced are broken and should be removed from the language). I've been updating the grammar definition as I worked on the parser. An HTML version is available for download here: https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html Lexer, parser, and AST code is available here: https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. Bug reports and pull requests are welcome. https://rawgithub.com/Hackerpilot/DGrammar/master/grammar.html ;)
Re: Official D Grammar
On Fri, Jul 19, 2013 at 10:49:25PM +0200, Brian Schott wrote: > Status update: > > My parser is at a point where it can parse all of phobos (well, > excluding c-style array declarations, which I'm convinced are broken > and should be removed from the language). Really? Those still exist in Phobos? Shouldn't they be rewritten to the "native" style? > I've been updating the grammar definition as I worked on the parser. > An HTML version is available for download here: > https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html > > Lexer, parser, and AST code is available here: > https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. > Bug reports and pull requests are welcome. Wonderful! I'll have to look into this sometime. I've been dreaming about writing a D pretty-printer, and this may be just the thing I need to get it going. :) (Well actually, I *wrote* a D pretty-printer, but I only got as far as lexing, with some token-matching rules for the pretty-printing. Needless to say, the results were rather underwhelming, since I didn't have an AST to work with.) T -- Written on the window of a clothing store: No shirt, no shoes, no service.
Re: Official D Grammar
Am Fri, 19 Jul 2013 22:49:25 +0200 schrieb "Brian Schott" : > Status update: > > My parser is at a point where it can parse all of phobos (well, > excluding c-style array declarations, which I'm convinced are > broken and should be removed from the language). > > I've been updating the grammar definition as I worked on the > parser. An HTML version is available for download here: > https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html > > Lexer, parser, and AST code is available here: > https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. > Bug reports and pull requests are welcome. Nice work! I guess you already tried druntime but did you also run your lexer and or parser against the dmd test suite? The test suite probably has quite some old legacy stuff like c-style array declarations so it might be a pita to run those tests but it might show some real problems as well. If you wanted to do more testing you could also run it on some well-maintained big D projects (GTKd, derelict3, ...) http://wiki.dlang.org/Libraries_and_Frameworks http://wiki.dlang.org/Open_Source_Projects It's also nice that we may finally get a correct and complete grammar. BTW: I think it might be a good idea to start a new thread for this update. This stuff is quite important for D/phobos and usually old threads tend to be ignored or get less attention.
Re: Official D Grammar
On 2013-07-19 22:49, Brian Schott wrote: Status update: My parser is at a point where it can parse all of phobos (well, excluding c-style array declarations, which I'm convinced are broken and should be removed from the language). I've been updating the grammar definition as I worked on the parser. An HTML version is available for download here: https://raw.github.com/Hackerpilot/DGrammar/master/grammar.html Lexer, parser, and AST code is available here: https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer/std/d. Bug reports and pull requests are welcome. Now for the million dollar question: does it work during compile time? -- /Jacob Carlborg
Re: Official D Grammar
On Friday, 19 July 2013 at 20:49:27 UTC, Brian Schott wrote: Status update: excluding c-style array declarations, which I'm convinced are broken and should be removed from the language +1 I found for myself that "native" style arrays are more natural.