Re: String Literal Docs
On Mon, 21 Jun 2010 15:20:16 -0500, Ellery Newcomer ellery- newco...@utulsa.edu wrote: Are your diagrams solely concerned with the lexer? Because I have a (messy) parser grammar which I'm a bit more confident about if you're interested. I can't speak for Alix, but I would absolutely be interested. I'm working on an Objective-D preprocessor and my parsing still has lots of holes, even besides the stuff I have marked to-do. A strict reading of the website has already turned up a few inaccuracies.
Re: String Literal Docs
On 20/06/2010 20:14, Nick Sabalausky wrote: div0d...@users.sourceforge.net wrote in message news:hvlok6$1rf...@digitalmars.com... On 20/06/2010 18:55, Nick Sabalausky wrote: div0d...@users.sourceforge.net wrote in message news:hvkrsc$2r5...@digitalmars.com... It says multiple of 2, not even number of digits. multiple of 2 == even number Even as in even vs odd I also said 'To me that implies'. Please don't take what I said out of context and be a smart arse about it. There's more than enough of that goes on round here. That wan't my intent, sorry if it came across that way. It sounded to me like you were implying there was a difference between multiple of 2 and even number. If that wasn't the case, then I guess I'm just not sure what you were really getting at. What I was getting at is that if you use the w suffix, then surely you would expect the number of hex digits to be a multiple of 4 not 2. If there are only 6 digits what then? Are the missing one inferred to be 0, is it a compile error, or something else? Because of the use of the 2, I inferred from the spec that the suffixes were not supposed to be allowed. If it had said even number of digits, I'd have been more inclined to think that the suffixes are legal. Either which way it just high lights that the spec isn't sufficiently clear. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Re: String Literal Docs
On 20/06/2010 22:46, Alix Pexton wrote: On 20/06/2010 21:37, Ellery Newcomer wrote: On 06/20/2010 03:01 PM, Alix Pexton wrote: On 19/06/2010 21:12, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. I think I will take the plunge and base my diagrams on the source of DMD. After looking at the code in lexer.c, it does not seem as far beyond my rusty old c++ parsing skills as I had expected! Massive credit to Walter for having a codebase that is as mature as DMD without it turning into a labyrinth of preprocessor macros and cryptic comefroms. This will mean however that my little project may take a little longer, sigh... A... Do share. I've always been too lazy to read lexer.c, and from this discussion, it sounds like there are a few spots where my own lexer grammar is incorrect (or at least differs from dmd). of course ^^ A... Well, I think I have got my head around lexer.c now, and its various peculiarities, like 000377. being a valid float (although not according to my shiny new, limited edition copy of tDPL (fig2.2 p35)^^). The weirdness occurs because some of some corner cases are handled not by the neat little state state machine that validates reals, but in the scanner at the point where it recognises a number beginning with a zero. The productions in lex.html represent the range of inputs that are accepted by the state machine without taking into account that the scanner rejects the sequence ._ (which makes sense as that is the identifier _ in the outer scope). Andrei's analysis in tDPL also points out that 0xp0 is a valid hexfloat, but a strict reading of lex.html would not allow it. Overall the diagram for hexfloat is much simpler than the one for decimalfloat, which I think will have to be split into 3 A... PS, octal must die!
Re: String Literal Docs
On 06/21/2010 02:21 PM, Alix Pexton wrote: On 20/06/2010 22:46, Alix Pexton wrote: On 20/06/2010 21:37, Ellery Newcomer wrote: On 06/20/2010 03:01 PM, Alix Pexton wrote: On 19/06/2010 21:12, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. I think I will take the plunge and base my diagrams on the source of DMD. After looking at the code in lexer.c, it does not seem as far beyond my rusty old c++ parsing skills as I had expected! Massive credit to Walter for having a codebase that is as mature as DMD without it turning into a labyrinth of preprocessor macros and cryptic comefroms. This will mean however that my little project may take a little longer, sigh... A... Do share. I've always been too lazy to read lexer.c, and from this discussion, it sounds like there are a few spots where my own lexer grammar is incorrect (or at least differs from dmd). of course ^^ A... Well, I think I have got my head around lexer.c now, and its various peculiarities, like 000377. being a valid float (although not according to my shiny new, limited edition copy of tDPL (fig2.2 p35)^^). Oh wow. That's a sweet little diagram. Those dots are hard to see though. The weirdness occurs because some of some corner cases are handled not by the neat little state state machine that validates reals, but in the scanner at the point where it recognises a number beginning with a zero. The productions in lex.html represent the range of inputs that are accepted by the state machine without taking into account that the scanner rejects the sequence ._ (which makes sense as that is the identifier _ in the outer scope). to hell with lexer.c. I'm not changing anything. Andrei's analysis in tDPL also points out that 0xp0 is a valid hexfloat, but a strict reading of lex.html would not allow it. Overall the diagram for hexfloat is much simpler than the one for decimalfloat, which I think will have to be split into 3 A... PS, octal must die! I'll settle for modified syntax 0c123. But yeah. Are your diagrams solely concerned with the lexer? Because I have a (messy) parser grammar which I'm a bit more confident about if you're interested.
Re: String Literal Docs
On 21/06/2010 21:20, Ellery Newcomer wrote: Are your diagrams solely concerned with the lexer? Because I have a (messy) parser grammar which I'm a bit more confident about if you're interested. So far I have only covered the lexer, but most of it needs redoing in light of the errors in the DMD docs, but I am hoping to cover the whole spec, eventually... The more I do the quicker I'm able to make them as my workflow evolves, so its hard to say how long it will take... A...
Re: String Literal Docs
On 20/06/2010 01:09, div0 wrote: On 19/06/2010 23:17, Ellery Newcomer wrote: All I can say is auto w = xdead beefw; results in Error: invalid UTF-8 sequence on dmd 2.047 Then you've found a bug, you know what to do: http://d.puremagic.com/issues/ Hmn, that would seem to indicate to me that the postfix is being allowed when the hex represents a valid UTF sequence, but not otherwise. I didn't do too much testing myself as I know next to zilch about string internals The text that describes hex strings says that they have to have an even number of digits, but this would seem to imply that they have to have a multiple of 4 or 8 for wstrings and dstrings respectively, which makes sense, but I'm not sure that can be verified in the lexing of a string literal without insane lookahead rules But, then I guess that is why the spec says that hex strings are exempt from the valid UTF rule, and in that case hexstrings should really make byte arrays rather than strings, but failing that, always chars and not anything wider. A...
Re: String Literal Docs
On 20/06/2010 11:03, Alix Pexton wrote: On 20/06/2010 01:09, div0 wrote: On 19/06/2010 23:17, Ellery Newcomer wrote: All I can say is auto w = xdead beefw; results in Error: invalid UTF-8 sequence on dmd 2.047 Then you've found a bug, you know what to do: http://d.puremagic.com/issues/ Hmn, that would seem to indicate to me that the postfix is being allowed when the hex represents a valid UTF sequence, but not otherwise. I didn't do too much testing myself as I know next to zilch about string internals The text that describes hex strings says that they have to have an even number of digits, but this would seem to imply that they have to have a multiple of 4 or 8 for wstrings and dstrings respectively, which makes sense, but I'm not sure that can be verified in the lexing of a string literal without insane lookahead rules It says multiple of 2, not even number of digits. To me that implies it's always 2 and the suffix acceptance is just a bug. It could be made more clear though. But, then I guess that is why the spec says that hex strings are exempt from the valid UTF rule, and in that case hexstrings should really make byte arrays rather than strings, but failing that, always chars and not anything wider. A... Yeah, hex strings should probably have the type ubyte[] If you using them to put arbitrary binary in your program you're almost certainly going to cast the array to something else anyway, so char[], wchar[], dchar[] all seem a bit pointless and as they allow invalid utf, making them ?char[] seems wrong. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Re: String Literal Docs
div0 d...@users.sourceforge.net wrote in message news:hvkrsc$2r5...@digitalmars.com... It says multiple of 2, not even number of digits. multiple of 2 == even number Even as in even vs odd Yeah, hex strings should probably have the type ubyte[] If you using them to put arbitrary binary in your program you're almost certainly going to cast the array to something else anyway, so char[], wchar[], dchar[] all seem a bit pointless and as they allow invalid utf, making them ?char[] seems wrong. You have me completely convinced.
Re: String Literal Docs
On 20/06/2010 18:55, Nick Sabalausky wrote: div0d...@users.sourceforge.net wrote in message news:hvkrsc$2r5...@digitalmars.com... It says multiple of 2, not even number of digits. multiple of 2 == even number Even as in even vs odd I also said 'To me that implies'. Please don't take what I said out of context and be a smart arse about it. There's more than enough of that goes on round here. I read the spec. as specifying that the hex characters should be in groups of 2, I also take it as implying that the suffixes are not applicable. You're more than welcome to your own take on it. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Re: String Literal Docs
On 19/06/2010 21:12, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. I think I will take the plunge and base my diagrams on the source of DMD. After looking at the code in lexer.c, it does not seem as far beyond my rusty old c++ parsing skills as I had expected! Massive credit to Walter for having a codebase that is as mature as DMD without it turning into a labyrinth of preprocessor macros and cryptic comefroms. This will mean however that my little project may take a little longer, sigh... A...
Re: String Literal Docs
On 20/06/2010 20:14, Nick Sabalausky wrote: div0d...@users.sourceforge.net wrote in message news:hvlok6$1rf...@digitalmars.com... On 20/06/2010 18:55, Nick Sabalausky wrote: div0d...@users.sourceforge.net wrote in message news:hvkrsc$2r5...@digitalmars.com... It says multiple of 2, not even number of digits. multiple of 2 == even number Even as in even vs odd I also said 'To me that implies'. Please don't take what I said out of context and be a smart arse about it. There's more than enough of that goes on round here. That wan't my intent, sorry if it came across that way. It sounded to me like you were implying there was a difference between multiple of 2 and even number. If that wasn't the case, then I guess I'm just not sure what you were really getting at. From looking at the source, I now know that all string literals can have a postfix, and that as far as lexing goes, all strings are in UTF8. I've not tracked down yet where the the value of the postfix is applied, but I'm fairly certain that it would be easy enough to turn off the UTF verification for the hexstrings at that end. As far as making my diagrams, I don't think it matters, for now... A...
Re: String Literal Docs
On 06/20/2010 03:01 PM, Alix Pexton wrote: On 19/06/2010 21:12, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. I think I will take the plunge and base my diagrams on the source of DMD. After looking at the code in lexer.c, it does not seem as far beyond my rusty old c++ parsing skills as I had expected! Massive credit to Walter for having a codebase that is as mature as DMD without it turning into a labyrinth of preprocessor macros and cryptic comefroms. This will mean however that my little project may take a little longer, sigh... A... Do share. I've always been too lazy to read lexer.c, and from this discussion, it sounds like there are a few spots where my own lexer grammar is incorrect (or at least differs from dmd).
Re: String Literal Docs
On 20/06/2010 21:37, Ellery Newcomer wrote: On 06/20/2010 03:01 PM, Alix Pexton wrote: On 19/06/2010 21:12, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. I think I will take the plunge and base my diagrams on the source of DMD. After looking at the code in lexer.c, it does not seem as far beyond my rusty old c++ parsing skills as I had expected! Massive credit to Walter for having a codebase that is as mature as DMD without it turning into a labyrinth of preprocessor macros and cryptic comefroms. This will mean however that my little project may take a little longer, sigh... A... Do share. I've always been too lazy to read lexer.c, and from this discussion, it sounds like there are a few spots where my own lexer grammar is incorrect (or at least differs from dmd). of course ^^ A...
String Literal Docs
I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. In the section of www.digitalmars.com/d/2.0/lex.html on string literals, the productions imply that the [c|w|d] postfix is allowed on Wysiwyg, DoubleQuote and Hex strings and not on either Delimited or Token strings, which didn't make a lot of sense to me, so I tested it with DMD (v2.046, win)... --- import std.stdio; void main(){ auto t1 = double quoted; // OK auto t2 = `back tick`d;// OK auto t3 = xdead beef;// postfix not allowed on hexstrings! auto t4 = qdelimited/d;// OK auto t5 = q{if}d; // OK writefln(all literals A-OK!); } --- This makes sense to me, HexStrings with wide chars would have made my brain scream So, to correct the documentation, the postfix needs to be removed from HexString and added to DelimitedString and TokenString. I tried to see if this was already reporded in the bug tracker but couldn't see anything close. On a slightly quieter note, there is also a spare underscore in the definition of HexidecimalDigit as it extends DecimalDigit which already has an underscore. I also noticed a bug in the tracker related to initial underscores in float literals, if the diagrams start getting to puzzling I might look into that ^^ A... PS, my copy of tDPL is in the post, yay!
Re: String Literal Docs
On 06/19/2010 03:12 PM, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. In the section of www.digitalmars.com/d/2.0/lex.html on string literals, the productions imply that the [c|w|d] postfix is allowed on Wysiwyg, DoubleQuote and Hex strings and not on either Delimited or Token strings, which didn't make a lot of sense to me, so I tested it with DMD (v2.046, win)... --- import std.stdio; void main(){ auto t1 = double quoted; // OK auto t2 = `back tick`d; // OK auto t3 = xdead beef; // postfix not allowed on hexstrings! auto t4 = qdelimited/d;// OK auto t5 = q{if}d; // OK writefln(all literals A-OK!); } --- This makes sense to me, HexStrings with wide chars would have made my brain scream http://d.puremagic.com/issues/show_bug.cgi?id=4351 but I'm not so sure about the hex string one. I think you just gave it invalid unicode. E.g., this compiles fine: auto w = x1e1d 1e1fw; on dmd 2.047 but what it results in is pretty screwy. So, to correct the documentation, the postfix needs to be removed from HexString and added to DelimitedString and TokenString. I tried to see if this was already reporded in the bug tracker but couldn't see anything close. On a slightly quieter note, there is also a spare underscore in the definition of HexidecimalDigit as it extends DecimalDigit which already has an underscore. I also noticed a bug in the tracker related to initial underscores in float literals, if the diagrams start getting to puzzling I might look into that ^^ What what? A... PS, my copy of tDPL is in the post, yay!
Re: String Literal Docs
On 19/06/2010 22:16, Ellery Newcomer wrote: On 06/19/2010 03:12 PM, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. In the section of www.digitalmars.com/d/2.0/lex.html on string literals, the productions imply that the [c|w|d] postfix is allowed on Wysiwyg, DoubleQuote and Hex strings and not on either Delimited or Token strings, which didn't make a lot of sense to me, so I tested it with DMD (v2.046, win)... --- import std.stdio; void main(){ auto t1 = double quoted; // OK auto t2 = `back tick`d; // OK auto t3 = xdead beef; // postfix not allowed on hexstrings! auto t4 = qdelimited/d;// OK auto t5 = q{if}d; // OK writefln(all literals A-OK!); } --- This makes sense to me, HexStrings with wide chars would have made my brain scream http://d.puremagic.com/issues/show_bug.cgi?id=4351 but I'm not so sure about the hex string one. I think you just gave it invalid unicode. E.g., this compiles fine: Hex strings are specifically exempted from the requirement for valid utf. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Re: String Literal Docs
On 06/19/2010 04:26 PM, div0 wrote: On 19/06/2010 22:16, Ellery Newcomer wrote: On 06/19/2010 03:12 PM, Alix Pexton wrote: I've been sketching some grammar diagrams for D2.0, a little like those on JSON.org, and of course I didn't get far before I ran into something odd. In the section of www.digitalmars.com/d/2.0/lex.html on string literals, the productions imply that the [c|w|d] postfix is allowed on Wysiwyg, DoubleQuote and Hex strings and not on either Delimited or Token strings, which didn't make a lot of sense to me, so I tested it with DMD (v2.046, win)... --- import std.stdio; void main(){ auto t1 = double quoted; // OK auto t2 = `back tick`d; // OK auto t3 = xdead beef; // postfix not allowed on hexstrings! auto t4 = qdelimited/d;// OK auto t5 = q{if}d; // OK writefln(all literals A-OK!); } --- This makes sense to me, HexStrings with wide chars would have made my brain scream http://d.puremagic.com/issues/show_bug.cgi?id=4351 but I'm not so sure about the hex string one. I think you just gave it invalid unicode. E.g., this compiles fine: Hex strings are specifically exempted from the requirement for valid utf. All I can say is auto w = xdead beefw; results in Error: invalid UTF-8 sequence on dmd 2.047
Re: String Literal Docs
On 19/06/2010 23:17, Ellery Newcomer wrote: All I can say is auto w = xdead beefw; results in Error: invalid UTF-8 sequence on dmd 2.047 Then you've found a bug, you know what to do: http://d.puremagic.com/issues/ -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk