Lexing / Parsing and final token

2021-01-19 Thread Alan & Kim Zimmerman
I am (still) working on !2418 to bring the API Annotations into the GHC
ParsedSource, and making good progress.

I am currently making a rough port of ghc-exactprint, to ensure I can get
all the tests around modifying the AST to work.

One of the last pieces is being able to capture the spacing from the last
token in the file to the EOF.  I guess technically it is the second last
token.

Empirically (calling getTokenStream), it seems this is always ITsemi.  I am
not sure how this comes about, as the `module` parsing rule in Parser.y
ends with body or body2, and those both finish with an actual or virtual
'}'.

Can I rely on the token before ITEof always being ITsemi?

Alan
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Lexing / Parsing and final token

2021-01-19 Thread Richard Eisenberg
That's bizarre. Does it still happen with explicit braces?

Just to test, I tried

module Bug where {
x = 5;
y = 6;
};

and GHC rejected because of the trailing ;.

Richard

> On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman  wrote:
> 
> I am (still) working on !2418 to bring the API Annotations into the GHC 
> ParsedSource, and making good progress.
> 
> I am currently making a rough port of ghc-exactprint, to ensure I can get all 
> the tests around modifying the AST to work.
> 
> One of the last pieces is being able to capture the spacing from the last 
> token in the file to the EOF.  I guess technically it is the second last 
> token.
> 
> Empirically (calling getTokenStream), it seems this is always ITsemi.  I am 
> not sure how this comes about, as the `module` parsing rule in Parser.y ends 
> with body or body2, and those both finish with an actual or virtual '}'.
> 
> Can I rely on the token before ITEof always being ITsemi?
> 
> Alan
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Lexing / Parsing and final token

2021-01-19 Thread Alan & Kim Zimmerman
Changing it to remove the final ';' gives a last token of ITccurly.

Changing it to

module Bug where
x = 5
y = 6

Gives a last token of ITsemi.

Alan

On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg  wrote:

> That's bizarre. Does it still happen with explicit braces?
>
> Just to test, I tried
>
> module Bug where {
> x = 5;
> y = 6;
> };
>
> and GHC rejected because of the trailing ;.
>
> Richard
>
> > On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman 
> wrote:
> >
> > I am (still) working on !2418 to bring the API Annotations into the GHC
> ParsedSource, and making good progress.
> >
> > I am currently making a rough port of ghc-exactprint, to ensure I can
> get all the tests around modifying the AST to work.
> >
> > One of the last pieces is being able to capture the spacing from the
> last token in the file to the EOF.  I guess technically it is the second
> last token.
> >
> > Empirically (calling getTokenStream), it seems this is always ITsemi.  I
> am not sure how this comes about, as the `module` parsing rule in Parser.y
> ends with body or body2, and those both finish with an actual or virtual
> '}'.
> >
> > Can I rely on the token before ITEof always being ITsemi?
> >
> > Alan
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Lexing / Parsing and final token

2021-01-19 Thread Richard Eisenberg
So, I think there's your answer: the last token might be ITccurly, not ITsemi. 
It seems that the "insert invisible curlies and semis" is taken more literally 
for semis than for curlies.

Richard

> On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman  wrote:
> 
> Changing it to remove the final ';' gives a last token of ITccurly.
> 
> Changing it to
> 
> module Bug where
> x = 5
> y = 6
> 
> Gives a last token of ITsemi.
> 
> Alan
> 
> On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg  > wrote:
> That's bizarre. Does it still happen with explicit braces?
> 
> Just to test, I tried
> 
> module Bug where {
> x = 5;
> y = 6;
> };
> 
> and GHC rejected because of the trailing ;.
> 
> Richard
> 
> > On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman  > > wrote:
> > 
> > I am (still) working on !2418 to bring the API Annotations into the GHC 
> > ParsedSource, and making good progress.
> > 
> > I am currently making a rough port of ghc-exactprint, to ensure I can get 
> > all the tests around modifying the AST to work.
> > 
> > One of the last pieces is being able to capture the spacing from the last 
> > token in the file to the EOF.  I guess technically it is the second last 
> > token.
> > 
> > Empirically (calling getTokenStream), it seems this is always ITsemi.  I am 
> > not sure how this comes about, as the `module` parsing rule in Parser.y 
> > ends with body or body2, and those both finish with an actual or virtual 
> > '}'.
> > 
> > Can I rely on the token before ITEof always being ITsemi?
> > 
> > Alan
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org 
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs 
> > 
> 

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Lexing / Parsing and final token

2021-01-19 Thread Alan & Kim Zimmerman
And if there is a comment after the '}' and then more blank lines, the last
token is a comment.

If no curlies, it is a ITsemi for the last location, after the comment.

So my hacky scheme of using ITsemi as the means to track the last gap is
not viable.

And I don't want to put extra housekeeping on every token to track two
tokens back, not just one. Back to the drawing board.

Thanks
  Alan


On Tue, 19 Jan 2021 at 21:59, Richard Eisenberg  wrote:

> So, I think there's your answer: the last token might be ITccurly, not
> ITsemi. It seems that the "insert invisible curlies and semis" is taken
> more literally for semis than for curlies.
>
> Richard
>
> On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman 
> wrote:
>
> Changing it to remove the final ';' gives a last token of ITccurly.
>
> Changing it to
>
> module Bug where
> x = 5
> y = 6
>
> Gives a last token of ITsemi.
>
> Alan
>
> On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg  wrote:
>
>> That's bizarre. Does it still happen with explicit braces?
>>
>> Just to test, I tried
>>
>> module Bug where {
>> x = 5;
>> y = 6;
>> };
>>
>> and GHC rejected because of the trailing ;.
>>
>> Richard
>>
>> > On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman 
>> wrote:
>> >
>> > I am (still) working on !2418 to bring the API Annotations into the GHC
>> ParsedSource, and making good progress.
>> >
>> > I am currently making a rough port of ghc-exactprint, to ensure I can
>> get all the tests around modifying the AST to work.
>> >
>> > One of the last pieces is being able to capture the spacing from the
>> last token in the file to the EOF.  I guess technically it is the second
>> last token.
>> >
>> > Empirically (calling getTokenStream), it seems this is always ITsemi.
>> I am not sure how this comes about, as the `module` parsing rule in
>> Parser.y ends with body or body2, and those both finish with an actual or
>> virtual '}'.
>> >
>> > Can I rely on the token before ITEof always being ITsemi?
>> >
>> > Alan
>> > ___
>> > ghc-devs mailing list
>> > ghc-devs@haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Lexing / Parsing and final token

2021-01-19 Thread Alan & Kim Zimmerman
FYI I did the horrible thing for now, optimisations welcome.

The change is at [1]

Alan

[1]
https://gitlab.haskell.org/ghc/ghc/-/commit/742273a94c187f51e3b143f9c206c42024486ecf?merge_request_iid=2418

On Tue, 19 Jan 2021 at 22:04, Alan & Kim Zimmerman 
wrote:

> And if there is a comment after the '}' and then more blank lines, the
> last token is a comment.
>
> If no curlies, it is a ITsemi for the last location, after the comment.
>
> So my hacky scheme of using ITsemi as the means to track the last gap is
> not viable.
>
> And I don't want to put extra housekeeping on every token to track two
> tokens back, not just one. Back to the drawing board.
>
> Thanks
>   Alan
>
>
> On Tue, 19 Jan 2021 at 21:59, Richard Eisenberg  wrote:
>
>> So, I think there's your answer: the last token might be ITccurly, not
>> ITsemi. It seems that the "insert invisible curlies and semis" is taken
>> more literally for semis than for curlies.
>>
>> Richard
>>
>> On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman 
>> wrote:
>>
>> Changing it to remove the final ';' gives a last token of ITccurly.
>>
>> Changing it to
>>
>> module Bug where
>> x = 5
>> y = 6
>>
>> Gives a last token of ITsemi.
>>
>> Alan
>>
>> On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg  wrote:
>>
>>> That's bizarre. Does it still happen with explicit braces?
>>>
>>> Just to test, I tried
>>>
>>> module Bug where {
>>> x = 5;
>>> y = 6;
>>> };
>>>
>>> and GHC rejected because of the trailing ;.
>>>
>>> Richard
>>>
>>> > On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman 
>>> wrote:
>>> >
>>> > I am (still) working on !2418 to bring the API Annotations into the
>>> GHC ParsedSource, and making good progress.
>>> >
>>> > I am currently making a rough port of ghc-exactprint, to ensure I can
>>> get all the tests around modifying the AST to work.
>>> >
>>> > One of the last pieces is being able to capture the spacing from the
>>> last token in the file to the EOF.  I guess technically it is the second
>>> last token.
>>> >
>>> > Empirically (calling getTokenStream), it seems this is always ITsemi.
>>> I am not sure how this comes about, as the `module` parsing rule in
>>> Parser.y ends with body or body2, and those both finish with an actual or
>>> virtual '}'.
>>> >
>>> > Can I rely on the token before ITEof always being ITsemi?
>>> >
>>> > Alan
>>> > ___
>>> > ghc-devs mailing list
>>> > ghc-devs@haskell.org
>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>
>>>
>>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs