On Freitag, 1. Oktober 2021 09:37:52 CEST Hans Åberg wrote: > > On 28 Sep 2021, at 14:10, Christian Schoenebeck > > <schoeneb...@crudebyte.com> wrote:> > > On Montag, 27. September 2021 22:07:33 CEST Hans Åberg wrote: > >>>> In order to generate better syntax error messages writing out the input > >>>> line with the error and a line with a marker underneath, I thought of > >>>> checking how Bison does it, but I could not find the place in its > >>>> sources. —Specifically, a suggestion is to tweak YY_INPUT in the lexer > >>>> to buffer one input line at a time, but Bison does not seem to do > >>>> that.> > >>> > >>> No, I keep track of the byte offset in the file, and print from the > >>> file, > >>> which I reopen to quote the source. > >> > >> OK. I thought of this method, but then it does not work with streams. > > > > In the past at least, builtin location support did not work well for me. > > So > > I'm usually overriding location data type and behaviour with custom type > > declaration, plus implementation on lexer side. > > > > I also prefer this data type presentation: > > > > // custom Bison location type to support raw byte positions > > struct _YYLTYPE { > > > > int first_line; > > int first_column; > > int last_line; > > int last_column; > > int first_byte; > > int length_bytes; > > > > }; > > #define YYLTYPE _YYLTYPE > > #define YYLTYPE_IS_DECLARED 1 > > > > // override Bison's default location passing to support raw byte positions > > #define YYLLOC_DEFAULT(Cur, Rhs, N) \ > > do \ > > > > if (N) \ > > > > { \ > > > > (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \ > > (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \ > > (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \ > > (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \ > > (Cur).first_byte = YYRHSLOC(Rhs, 1).first_byte; \ > > (Cur).length_bytes = (YYRHSLOC(Rhs, N).first_byte - \ > > > > YYRHSLOC(Rhs, 1).first_byte) + \ > > YYRHSLOC(Rhs, N).length_bytes; \ > > > > } \ > > > > else \ > > > > { \ > > > > (Cur).first_line = (Cur).last_line = \ > > > > YYRHSLOC(Rhs, 0).last_line; \ > > > > (Cur).first_column = (Cur).last_column = \ > > > > YYRHSLOC(Rhs, 0).last_column; \ > > > > (Cur).first_byte = YYRHSLOC(Rhs, 0).first_byte; \ > > (Cur).length_bytes = YYRHSLOC(Rhs, 0).length_bytes; \ > > > > } \ > > > > while (0) > > > > Because sometimes you need high level column & line span, and sometimes > > you > > rather need low level raw byte position & byte length in the input data > > stream. > > For the purpose of writing out the line in the error messages, this method > (using C++) did not work out well, because I have two parsers, one for the > language and one for directives, and it turns out to be difficult to pass > the location information back to the top parser. > > So instead, in addition to the input stream stack, I added two, for the > current stream position, and the current stream line position. Because of > the lexer buffering, they are computed in the lexer. These are properties > attached to the input streams then, not the parser locations. > > In the Bison type, I use line number and for columns the number of UTF-8 > characters. An ASCII caret marking the error is surprisingly accurate even > in the presence of non-ASCII characters. But perhaps one should have a > method to mark it on the line itself, not underneath.
Hmm, those two parsers run independently from each other, or do you rather mean you have coupled them in a way that they cross-influence their behaviour *while* they are still running? So far I have not encountered any restriction with my location approach. I'm using it for all kinds of things like, of course warnings/errors on the CLI, highlighting of the same in code editors, but also for code refactoring stuff. The latter only works well with a full language aware parser, unlike those typical RegEx hacks. Best regards, Christian Schoenebeck