Thanks Nicolas.  The whitespace rule seems clean, so I will try that.

I didn't mention that the input is, at its base, a 3 address code with some
support for arbitrary expressions, casting, and other C-like constructs,
and the output is assembly.  So there's not much need for an AST.

Shane


On Fri, Apr 26, 2019 at 5:40 AM Nicolas Laurent <
nicolas.laur...@uclouvain.be> wrote:

> Hey Shane,
>
> It's a weird setup that you have.
>
> Wouldn't it be easier to just parse the initial input into an AST then
> just do tree transformation instead of going back to text between each pass?
> That way you could store source line information inside the AST and
> propagate into each successive tree.
>
> I think it's definitely worth it to allow comments wherever white space is
> allowed in your grammar instead of stripping it as pre-processing.
>
> A simple way to do that is to define a whitespace rule (including
> comments), and to define all your primitive elements (i.e. tokens in
> languages that are defined in terms of tokens) as a sequence of the
> elements optionally followed by whitespace. Also allow whitespace at the
> start of the file! This keeps the whitespace logic nicely contained and
> minimizes the number of changes to make to the grammar.
>
> I've yet to define the story of my tool (
> https://github.com/norswap/autumn4/) for source position tracking, so if
> you have any insight or question you don't feel like posting on here, feel
> free to message me.
>
> What you don't mention, but might be interested in is how I handle
> reporting positions when e.g. there is an error. The parser only works in
> terms of "positions" which are index into the input string. I maintain a
> data structure that is able to map these positions back to ta (line,
> column) pair. Basially this structure indexes the position of each newline
> in the input, and is able to convert tabs to a predefined width (code here:
> https://github.com/norswap/autumn4/blob/master/src/norswap/autumn/LineMap.java
> ).
>
> Cheers,
>
> Nicolas LAURENT
>
>
> On Fri, 26 Apr 2019 at 06:59, Shane Ryoo <shane.r...@gmail.com> wrote:
>
>> Hello,
>>
>> I've inherited some code that does source-to-source translation, with
>> five different passes (each with a different .peg) that take a single
>> long string as input and emits a string as output.  I've been
>> requested to add source line information to the results, some way or
>> another.  Currently the code strips out all comments from the source
>> and never adds/handles any.
>>
>> My question: what would be best practice here for retaining source
>> line information?  I've thrown around some ideas like supporting
>> comments throughout the grammars, breaking up the single string format
>> and adding line data per string, unifying the grammars so it's a
>> single pass, but nothing proposed has been entirely satisfactory.  My
>> previous experience is entirely flex-bison with a single grammar going
>> to an IR.
>>
>> Thanks in advance.
>>
>> _______________________________________________
>> PEG mailing list
>> PEG@lists.csail.mit.edu
>> https://lists.csail.mit.edu/mailman/listinfo/peg
>>
>
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to