Re: Attributes (lexical)

2021-11-25 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Thursday, 25 November 2021 at 12:16:50 UTC, rumbu wrote:
I try to base my reasoning on specification, dmd is not always 
a good source of information, the lexer is polluted by old 
features or right now by the ImportC feature, trying to lex D 
an C in the same time.


Alright. I haven't looked at it after the ```importC``` feature 
was started on.


(The lexer code takes a bit of browsing to get used to, but it 
isn't all that challenging once you are into it.)




Re: Attributes (lexical)

2021-11-25 Thread Dennis via Digitalmars-d-learn

On Thursday, 25 November 2021 at 12:09:55 UTC, Dennis wrote:

This should also be fixed in the spec.


Filed as:

Issue 22543 - [spec] grammar blocks use unspecified notation:
https://issues.dlang.org/show_bug.cgi?id=22543

Issue 22544 - [spec] C++ and Objective-C are not single tokens
https://issues.dlang.org/show_bug.cgi?id=22544



Re: Attributes (lexical)

2021-11-25 Thread rumbu via Digitalmars-d-learn
On Thursday, 25 November 2021 at 11:25:49 UTC, Ola Fosheim 
Grøstad wrote:

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:
I am not asking this questions out of thin air, I am trying to 
write a conforming lexer and this is one of the ambiguities.


I think it is easier to just look at the lexer in the dmd 
source. The D language does not really have a proper spec, it 
is more like an effort to document the implementation.


I try to base my reasoning on specification, dmd is not always a 
good source of information, the lexer is polluted by old features 
or right now by the ImportC feature, trying to lex D an C in the 
same time.


DMD skips the new line if the file was not specified, that's why 
the "filename" is unexpected on a new line:

https://github.com/dlang/dmd/blob/d374003a572fe0c64da4aa4dcc55d894c648514b/src/dmd/lexer.d#L2838

libdparse completely ignores the contents after #line skipping 
everything until EOL, even a EOF/NUL marker which should end the 
lexing:

https://github.com/dlang-community/libdparse/blob/7112880dae3f25553d96dae53a445c16261de7f9/src/dparse/lexer.d#L1100



Re: Attributes (lexical)

2021-11-25 Thread zjh via Digitalmars-d-learn

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:


#

//this works

line


I hate `#`.


Re: Attributes (lexical)

2021-11-25 Thread Dennis via Digitalmars-d-learn

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:

Well:

```
#line IntegerLiteral Filespec? EndOfLine
```

Having EndOfLine at the end means for me that there are no 
other EOLs between, otherwise this syntax should pass but it's 
not (DMD last):


```d
#line 12
"source.d"
```


The lexical grammar section starts with:

The source text is decoded from its source representation into 
Unicode Characters. The Characters are further divided into: 
WhiteSpace, EndOfLine, Comments, SpecialTokenSequences, and 
Tokens, with the source terminated by an EndOfFile.


What it's failing to mention is how in the lexical grammar rules, 
spaces denote 'immediate concatenation' of the characters/rules 
before and after it, e.g.:

```
DecimalDigits:
DecimalDigit
DecimalDigit DecimalDigits
```
`3 1  4` is not a single `IntegerLiteral`, it needs to be `314`.

Now in the parsing grammar, it should mention that spaces denote 
immediate concatenation of *Tokens*, with arbitrary *Comments* 
and *WhiteSpace* inbetween. So the rule:

```
AtAttribute:
@ nogc
```
Means: an @ token, followed by arbitrary comments and whitespace, 
followed by an identifier token that equals "nogc". That explains 
your first example.


Regarding this lexical rule:
```
#line IntegerLiteral Filespec? EndOfLine
```
This is wrong already from a lexical standpoint, it would suggest 
a SpecialTokenSequence looks like this:

```
#line10"file"
```

The implementation actually looks for a # token, skips 
*WhiteSpace* and *Comment*s, looks for an identifier token 
("line"), and then it goes into a custom loop that allows 
separation by *WhiteSpace* but not *Comment*, and also the first 
'\n' will be assumed to be the final *EndOfLine*, which is why 
this fails:

```
#line 12
"source.d"
```
It thinks it's done after "12".

In conclusion the specification should:
- define the notation used in lexical / parsing grammar blocks
- clearly distinguish lexical / parsing blocks
- fix up the `SpecialTokenSequence` definition (and maybe change 
dmd as well)


By the way, the parsing grammar defines:
```
LinkageType:
C
C++
D
Windows
System
Objective-C
```
C++ and Objective-C cannot be single tokens currently, so they 
are actually 2/3, which is why these are allowed:


```D
extern(C
   ++)
void f() {}

extern(Objective
   -
   C)
void g() {}
```
This should also be fixed in the spec.

I am not asking this questions out of thin air, I am trying to 
write a conforming lexer and this is one of the ambiguities.


That's cool! Are you writing an editor plugin?



Re: Attributes (lexical)

2021-11-25 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:
I am not asking this questions out of thin air, I am trying to 
write a conforming lexer and this is one of the ambiguities.


I think it is easier to just look at the lexer in the dmd source. 
The D language does not really have a proper spec, it is more 
like an effort to document the implementation.




Re: Attributes (lexical)

2021-11-25 Thread Rumbu via Digitalmars-d-learn

On Thursday, 25 November 2021 at 10:10:25 UTC, Dennis wrote:

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:
Also, this works also for #line, even if the specification 
tells us that all tokens must be on the same line


Where does it say that?


Well:

```
#line IntegerLiteral Filespec? EndOfLine
```

Having EndOfLine at the end means for me that there are no other 
EOLs between, otherwise this syntax should pass but it's not (DMD 
last):


```d
#line 12
"source.d"
```

I am not asking this questions out of thin air, I am trying to 
write a conforming lexer and this is one of the ambiguities.




Re: Attributes (lexical)

2021-11-25 Thread Dennis via Digitalmars-d-learn

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:
Also, this works also for #line, even if the specification 
tells us that all tokens must be on the same line


Where does it say that?



Re: Attributes (lexical)

2021-11-25 Thread Elronnd via Digitalmars-d-learn

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:

Is that ok or it's a lexer bug?


@ (12) does exactly what I would expect.  @nogc I always assumed 
was a single token, but the spec says otherwise.  I suppose that 
makes sense.


#line is dicier as it is not part of the grammar proper; however 
the spec describes it as a 'special token sequence', and comments 
are not tokens, so I think the current behaviour is correct.


Re: Attributes (lexical)

2021-11-25 Thread Ola Fosheim Grøstad via Digitalmars-d-learn

On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:

Is that ok or it's a lexer bug?


Yes. The lexer just eats whitespace and the parser accepts way 
too much.




Attributes (lexical)

2021-11-25 Thread rumbu via Digitalmars-d-learn

Just playing around with attributes.

This is valid D code:

```d

@


nogc: //yes, this is @nogc in fact, even some lines are between


@

/* i can put some comments
*/

/** even some documentation
*/

// single line comments also

(12)

// yes, comments and newlines are allowed between attribute and 
declaration



int x; //@(12) is attached to declaration
```

Is that ok or it's a lexer bug?


Also, this works also for #line, even if the specification tells 
us that all tokens must be on the same line



```d

#

//this works

line

/* this too */

12

//this is #line 12


```