Thanks Nicolas. The whitespace rule seems clean, so I will try that. I didn't mention that the input is, at its base, a 3 address code with some support for arbitrary expressions, casting, and other C-like constructs, and the output is assembly. So there's not much need for an AST.
Shane On Fri, Apr 26, 2019 at 5:40 AM Nicolas Laurent < nicolas.laur...@uclouvain.be> wrote: > Hey Shane, > > It's a weird setup that you have. > > Wouldn't it be easier to just parse the initial input into an AST then > just do tree transformation instead of going back to text between each pass? > That way you could store source line information inside the AST and > propagate into each successive tree. > > I think it's definitely worth it to allow comments wherever white space is > allowed in your grammar instead of stripping it as pre-processing. > > A simple way to do that is to define a whitespace rule (including > comments), and to define all your primitive elements (i.e. tokens in > languages that are defined in terms of tokens) as a sequence of the > elements optionally followed by whitespace. Also allow whitespace at the > start of the file! This keeps the whitespace logic nicely contained and > minimizes the number of changes to make to the grammar. > > I've yet to define the story of my tool ( > https://github.com/norswap/autumn4/) for source position tracking, so if > you have any insight or question you don't feel like posting on here, feel > free to message me. > > What you don't mention, but might be interested in is how I handle > reporting positions when e.g. there is an error. The parser only works in > terms of "positions" which are index into the input string. I maintain a > data structure that is able to map these positions back to ta (line, > column) pair. Basially this structure indexes the position of each newline > in the input, and is able to convert tabs to a predefined width (code here: > https://github.com/norswap/autumn4/blob/master/src/norswap/autumn/LineMap.java > ). > > Cheers, > > Nicolas LAURENT > > > On Fri, 26 Apr 2019 at 06:59, Shane Ryoo <shane.r...@gmail.com> wrote: > >> Hello, >> >> I've inherited some code that does source-to-source translation, with >> five different passes (each with a different .peg) that take a single >> long string as input and emits a string as output. I've been >> requested to add source line information to the results, some way or >> another. Currently the code strips out all comments from the source >> and never adds/handles any. >> >> My question: what would be best practice here for retaining source >> line information? I've thrown around some ideas like supporting >> comments throughout the grammars, breaking up the single string format >> and adding line data per string, unifying the grammars so it's a >> single pass, but nothing proposed has been entirely satisfactory. My >> previous experience is entirely flex-bison with a single grammar going >> to an IR. >> >> Thanks in advance. >> >> _______________________________________________ >> PEG mailing list >> PEG@lists.csail.mit.edu >> https://lists.csail.mit.edu/mailman/listinfo/peg >> >
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg