Hi Frank! > Le 11 janv. 2022 à 01:51, Frank Heckenbach <[email protected]> a écrit : > > Akim Demaille wrote: > >>> void count_chars (const char *s, size_t n) >> >> I understand the need, and agree with the proposal. However I'm >> concerned by the fact that this gives a meaning to 'columns'. Currently, >> it is unspecified: it could be just bytes, or actual characters, > > ... or character width (as Bison uses now in its own scanner).
Indeed. > Then again, the simple form as given in calc++/scanner.ll (also in > the manual) has the same issues: > > # define YY_USER_ACTION loc.columns (yyleng); > > Actually, I used something like this in my own code until I had > multi-line tokens, when I extended it to support newlines, unaware > of these other issues. > > You might want to add at least a comment about these issues in this > and other examples, in the manual, and in location.cc. I'm not sure what and where to write in location.cc. But how about the appended patch? > BTW, when I added tab handling to my function, I noticed a possible > issue with the position interface. I had to do something like this: > > case '\t': > end.column += tab_width - end.column % tab_width; > break; > > since the columns() function can only add columns, not query the > current column as needed to compute tabs. It's not a real problem > since column is a public field, but it feels a bit like breaking the > abstraction. You may or may not care about this (just please don't > make column non-public in the future :). Nah, it's public on purpose. I don't plan in over-engineering this class :) >> Maybe something like %code location and %code position. WDYT? One >> issue would then be that if someone introduced something new in position >> or location, and we later add the same name, we would break the >> user's code :( > > If I understand you right, this would only enable one to put one's > own implementation of such a function in the parser, rather than the > scanner file. Of course, one can't add a function to yy::location in > the scanner file, but one can use a free function and pass the > location object to it, or subclass yy::location with this function > added easily. And the scanner file is where this code belongs > logically more than the parser file (if not the location class in > the first place). So I don't think it's worth the effort to add this > to Bison. Agreed. The user can easily plug her own definition of locations, that's good enough as it is. Cheers! commit 5e2e1f03fa4e6bae59ad72a8ee10b4344457b8d8 Author: Akim Demaille <[email protected]> Date: Sat Jan 15 10:28:16 2022 +0100 doc: explain why location's "column" are defined vaguely Suuggested by Frank Heckenbach. <https://lists.gnu.org/r/bug-bison/2022-01/msg00000.html> * doc/bison.texi (Location Type): Explain why location's "column" are defined vaguely. Show tab handling in ltcalc and calc++. * examples/c/bistromathic/parse.y: Show tab handling. * examples/c++/calc++/calc++.test, * examples/c/bistromathic/bistromathic.test: Check tab handling. diff --git a/doc/bison.texi b/doc/bison.texi index 2dab6c49..de6021d8 100644 --- a/doc/bison.texi +++ b/doc/bison.texi @@ -2365,6 +2365,8 @@ @node Location Tracking Calc * Ltcalc Lexer:: The lexical analyzer. @end menu +See @ref{Tracking Locations} for details about locations. + @node Ltcalc Declarations @subsection Declarations for @code{ltcalc} @@ -2488,7 +2490,7 @@ @node Ltcalc Lexer @group /* Skip white space. */ while ((c = getchar ()) == ' ' || c == '\t') - ++yylloc.last_column; + yylloc.last_column += c == '\t' ? 8 - ((yylloc.last_column - 1) & 7) : 1; @end group @group @@ -4751,6 +4753,33 @@ @node Location Type initialization), use the @code{%initial-action} directive. @xref{Initial Action Decl}. +@sp 1 + +@cindex column +The meaning of ``column'' is deliberately left vague since there are several +options, depending on the use cases. + +With multibyte input (say UTF-8), simply counting the number of bytes does +not match character positions on the screen. One needs advanced functions +mapping multibyte characters to their visual width (see for instance +Gnulib's @code{mbswidth} and @code{mbsnwidth} functions). Tabulation +characters probably need a dedicated implementation, to match the ``go to +next multiple of 8'' behavior. + +However to quote input in error messages, as @command{bison} does: + +@example +@group +1.10-12: @derror{error}: invalid identifier: ‘3.8’ + 1 | %require @derror{3.8} + | @derror{^~~} +@end group +@end example + +@noindent +then byte positions are more handy. So in some cases, tracking both visual +character position @emph{and} byte position is the best option. This is +what @command{bison} does. @node Actions and Locations @subsection Actions and Locations @@ -13745,8 +13774,14 @@ @node Calc++ Scanner @example @group %@{ + // Take 8-space tabulations into account. + void add_columns (yy::location& loc, const char *buf, int bufsize) + @{ + for (int i = 0; i < bufsize; ++i) + loc.columns (buf[i] == '\t' ? 8 - ((loc.end.column - 1) & 7) : 1); + @} // Code run each time a pattern is matched. - # define YY_USER_ACTION loc.columns (yyleng); + #define YY_USER_ACTION add_columns (loc, yytext, yyleng); %@} @end group %% diff --git a/examples/c++/calc++/calc++.test b/examples/c++/calc++/calc++.test index 290e6c24..a3ab5acb 100755 --- a/examples/c++/calc++/calc++.test +++ b/examples/c++/calc++/calc++.test @@ -50,6 +50,21 @@ EOF run 1 'err: -:2.1: syntax error, unexpected end of file, expecting ( or identifier or number' +# Check handling of tabs. +cat >input <<EOF + *1 +EOF +run 1 'err: -:1.9: syntax error, unexpected *, expecting ( or identifier or number' +cat >input <<EOF + *2 +EOF +run 1 'err: -:1.9: syntax error, unexpected *, expecting ( or identifier or number' +cat >input <<EOF + *3 +EOF +run 1 'err: -:1.9: syntax error, unexpected *, expecting ( or identifier or number' + + # LAC finds many more tokens. cat >input <<EOF a := 1 diff --git a/examples/c/bistromathic/bistromathic.test b/examples/c/bistromathic/bistromathic.test index 2db5dcfe..921e17ab 100755 --- a/examples/c/bistromathic/bistromathic.test +++ b/examples/c/bistromathic/bistromathic.test @@ -366,3 +366,26 @@ err: 1.15: syntax error: expected - or ( or number or function or variable befor err: 1 | (1++2) + 3 + '' err: | ^ ' + +# Check handling of literal tabs. "Escape" them with a C-v, so that +# they are not processed as completion requests. +cat >input<<EOF + *1 + *2 + *3 +EOF +# readline processes the tabs itself, and replaces then with spaces. +run -n 0 '> *1 +> *2 +> *3 +> '' +err: 1.9: syntax error: expected end of file or - or ( or exit or number or function etc., before * +err: 1 | *1 +err: | ^ +err: 2.9: syntax error: expected end of file or - or ( or exit or number or function etc., before * +err: 2 | *2 +err: | ^ +err: 3.9: syntax error: expected end of file or - or ( or exit or number or function etc., before * +err: 3 | *3 +err: | ^ +' diff --git a/examples/c/bistromathic/parse.y b/examples/c/bistromathic/parse.y index 5c1e6055..8094cb9f 100644 --- a/examples/c/bistromathic/parse.y +++ b/examples/c/bistromathic/parse.y @@ -308,14 +308,15 @@ yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc, { int c; - // Ignore white space, get first nonwhite character. + // Get next character, ignore white spaces. do { // Move the first position onto the last. yylloc->first_line = yylloc->last_line; yylloc->first_column = yylloc->last_column; - - yylloc->last_column += 1; c = *((*line)++); + // Tab characters go to the next column multiple of 8. + yylloc->last_column += + c == '\t' ? 8 - ((yylloc->last_column - 1) & 7) : 1; } while (c == ' ' || c == '\t'); switch (c)
