Hi all, > Le 9 sept. 2019 à 18:46, Akim Demaille <[email protected]> a écrit : > > Hi Paul, > > In d8d3f94a993ce890baae68bf9da7ded29f9f8d76 (2002 :-), you introduced > no_cr_read in the grammar scanner: any lone \r is treated as a \n. > > Today, because the diagnostics read only \n as "end of line", there's an > offset in the quoted lines. > > $ cat -vn /tmp/f.y > 1 %token FOO^M "" > 2 FOO > 3 %% > 4 exp: FOO > $ LC_ALL=C bison /tmp/f.y > /tmp/f.y:3.2-4: warning: symbol FOO redeclared [-Wother] > 3 | %% > | ^~~ > > Worse yet, because I was no cautious enough, sometimes we get in a never > ending loop calling getc waiting for a \n to come, but we're stuck on getting > EOF.
The following commit should address this part (diff -w to shorten it). commit d120a07e6be65657b5a017fa3503aa900fbfa595 Author: Akim Demaille <[email protected]> Date: Mon Sep 9 20:13:04 2019 +0200 diagnostics: beware of unexpected EOF when quoting the source file When the input file contains lone CRs (aka, ^M, \r), the locations see a new line. Diagnostics look only at \n as end-of-line, so sometimes there is an offset in diagnostics. Worse yet: sometimes we loop endlessly waiting for \n to come from a continuous stream of EOF. Fix that: - check for EOF - beware not to call end_use_class if begin_use_class was not called (which would abort). This could happen if the actual line is shorter that the expected one. Prompted by a (private) report from Marc Schönefeld. * src/location.c (location_caret): here. * tests/diagnostics.at (Carriage return): New. diff --git a/src/location.c b/src/location.c index 80e71fb8..40fbc04e 100644 --- a/src/location.c +++ b/src/location.c @@ -229,7 +229,13 @@ location_caret (location loc, const char *style, FILE *out) /* Advance to the line's position, keeping track of the offset. */ while (caret_info.line < loc.start.line) - caret_info.line += getc (caret_info.source) == '\n'; + { + int c = getc (caret_info.source); + if (c == EOF) + /* Something is wrong, that line number does not exist. */ + return; + caret_info.line += c == '\n'; + } caret_info.offset = ftell (caret_info.source); /* Read the actual line. Don't update the offset, so that we keep a pointer @@ -238,32 +244,43 @@ location_caret (location loc, const char *style, FILE *out) int c = getc (caret_info.source); if (c != EOF) { + bool single_line = loc.start.line == loc.end.line; /* Quote the file (at most the first line in the case of multiline locations). */ + { fprintf (out, "%5d | ", loc.start.line); - bool single_line = loc.start.line == loc.end.line; /* Consider that single point location (with equal boundaries) actually denote the character that they follow. */ int byte_end = loc.end.byte + (single_line && loc.start.byte == loc.end.byte); /* Byte number. */ int byte = 1; + /* Whether we opened the style. If the line is not as + expected (maybe the file was changed since the scanner + ran), we might reach the end before we actually saw the + opening column. */ + bool opened = false; while (c != EOF && c != '\n') { if (byte == loc.start.byte) + { begin_use_class (style, out); + opened = true; + } fputc (c, out); c = getc (caret_info.source); ++byte; - if (single_line + if (opened + && (single_line ? byte == byte_end - : c == '\n' || c == EOF) + : c == '\n' || c == EOF)) end_use_class (style, out); } putc ('\n', out); + } - { /* Print the carets with the same indentation as above. */ + { fprintf (out, " | %*s", loc.start.column - 1, ""); begin_use_class (style, out); putc ('^', out); @@ -275,11 +292,11 @@ location_caret (location loc, const char *style, FILE *out) for (int i = loc.start.column + 1; i < len; ++i) putc ('~', out); end_use_class (style, out); - } putc ('\n', out); } } } +} bool location_empty (location loc) diff --git a/tests/diagnostics.at b/tests/diagnostics.at index 15815db3..d9398dd5 100644 --- a/tests/diagnostics.at +++ b/tests/diagnostics.at @@ -35,17 +35,23 @@ AT_BISON_OPTION_PUSHDEFS AT_DATA_GRAMMAR([[input.y]], [$2]) -AT_DATA([experr.orig], [$4]) +# For some reason, literal ^M in the input are removed and don't end +# in `input.y`. So use the two-character ^M represent it, and let +# Perl insert real CR characters. +AT_CHECK([perl -pi -e 's{\^M}{\r}gx' input.y]) + +AT_DATA([experr], [$4]) + +AT_CHECK([LC_ALL=en_US.UTF-8 bison -fcaret --color=debug -Wall input.y], [$3], [], [experr]) # When no style, same messages, but without style. -AT_CHECK([perl -p -e 's{</?\w+>}{}g' <experr.orig >experr]) +AT_CHECK([perl -pi -e 's{(</?\w+>)}{ $[]1 eq "<tag>" ? $[]1 : "" }ge' experr]) + # Cannot use AT_BISON_CHECK easily as we need to change the # environment. # FIXME: Enhance AT_BISON_CHECK. AT_CHECK([LC_ALL=en_US.UTF-8 bison -fcaret -Wall input.y], [$3], [], [experr]) -AT_CHECK([cp experr.orig experr]) -AT_CHECK([LC_ALL=en_US.UTF-8 bison -fcaret --color=debug -Wall input.y], [$3], [], [experr]) AT_BISON_OPTION_POPDEFS @@ -255,6 +261,24 @@ input.y: <warning>warning:</warning> fix-its can be applied. Rerun with option ]]) +## ----------------- ## +## Carriage return. ## +## ----------------- ## + +# Carriage-return used to count as a newline in the scanner, and not +# in diagnostics. Resulting in all sort of nice bugs. + +AT_TEST([[Carriage return]], +[[^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M +%token " +%% +]], +[1], +[[input.y:37.8-38.0: <error>error:</error> missing '"' at end of line +input.y:37.8-38.0: <error>error:</error> syntax error, unexpected string, expecting char or identifier or <tag> +]]) + + m4_popdef([AT_TEST])
