Hey Shane,

It's a weird setup that you have.

Wouldn't it be easier to just parse the initial input into an AST then just
do tree transformation instead of going back to text between each pass?
That way you could store source line information inside the AST and
propagate into each successive tree.

I think it's definitely worth it to allow comments wherever white space is
allowed in your grammar instead of stripping it as pre-processing.

A simple way to do that is to define a whitespace rule (including
comments), and to define all your primitive elements (i.e. tokens in
languages that are defined in terms of tokens) as a sequence of the
elements optionally followed by whitespace. Also allow whitespace at the
start of the file! This keeps the whitespace logic nicely contained and
minimizes the number of changes to make to the grammar.

I've yet to define the story of my tool
(https://github.com/norswap/autumn4/) for
source position tracking, so if you have any insight or question you don't
feel like posting on here, feel free to message me.

What you don't mention, but might be interested in is how I handle
reporting positions when e.g. there is an error. The parser only works in
terms of "positions" which are index into the input string. I maintain a
data structure that is able to map these positions back to ta (line,
column) pair. Basially this structure indexes the position of each newline
in the input, and is able to convert tabs to a predefined width (code here:
https://github.com/norswap/autumn4/blob/master/src/norswap/autumn/LineMap.java
).

Cheers,

Nicolas LAURENT


On Fri, 26 Apr 2019 at 06:59, Shane Ryoo <shane.r...@gmail.com> wrote:

> Hello,
>
> I've inherited some code that does source-to-source translation, with
> five different passes (each with a different .peg) that take a single
> long string as input and emits a string as output.  I've been
> requested to add source line information to the results, some way or
> another.  Currently the code strips out all comments from the source
> and never adds/handles any.
>
> My question: what would be best practice here for retaining source
> line information?  I've thrown around some ideas like supporting
> comments throughout the grammars, breaking up the single string format
> and adding line data per string, unifying the grammars so it's a
> single pass, but nothing proposed has been entirely satisfactory.  My
> previous experience is entirely flex-bison with a single grammar going
> to an IR.
>
> Thanks in advance.
>
> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg
>
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to