On Fri, 7 Nov 2008, Joel E. Denny wrote:
On Fri, 7 Nov 2008, Akim Demaille wrote:I have failures here on Java. # -*- compilation -*- 197. java.at:385: testing ... ../../../tests/java.at:385: bison -o Calc.java Calc.y ../../../tests/java.at:385: test -n "$CONF_JAVA" || exit 77 test -n "$CONF_JAVAC" || exit 77 Not enabling shell tracing (command contains an embedded newline) ../../../tests/java.at:385: $SHELL ../../../javacomp.sh Calc.java stderr: Calc.java:391: unreachable statement { yyval = new Integer (0); return YYERROR; ;}; ^ Calc.java:400: unreachable statement { yyval = new Integer (0); return YYERROR; ;}; ^ 2 errorsI pushed these 2 patches to branch-2.4.1 and will push to master after I've tested there. However, it's fine by me if people want to get rid of this feature altogether for 2.5.
I wrote a patch to solve this by adding the semicolon only if needed. I updated the patch for the current tree but I won't be able to finish retesting it today. (Am I suppose to Cc: to bison-patches?) The second patch implements the desired functionality as specified in a FIXME comment, except that tha patch gives unconditional warnings rather than conditional on -Wyacc (which is the documented name for the --pedantic option in the FIXME, which currently does nothing; I may work on it later). I think we should encourage people to fix their code rather than rely on this Bison behavior. Also, my patch handles C preprocessor directives: it's WRONG to add semicolons to the end of those. By the way, to support languages that doesn't use semicolons, just make the added semicolon a M4 macro that defaults to `;' and other skeletons can define it to empty. While writing the testcase, I noticed that Bison doesn't allow unescaped newlines in string literals in user code. I think this should be up to the compiler that actually interpretes the code. That's the first patch. Tested together on Cygwin with javac 1.6.0-07 and gij 3.4.4 with one expected failure: http://lists.gnu.org/archive/html/bug-bison/2008-07/msg00008.html Di-an Jan 2008-11-08 Di-an Jan <[EMAIL PROTECTED]> Allow unescaped newlines in character and string literals. * src/scan-gram.l (flex rules section): Copy newlines even SC_STRING and SC_CHARACTER. Don't generate error. diff --git a/src/scan-gram.l b/src/scan-gram.l index 697f52f..f951db1 100644 --- a/src/scan-gram.l +++ b/src/scan-gram.l @@ -456,14 +456,12 @@ splice (\\[ \f\t\v]*\n)* <SC_CHARACTER> { "'" STRING_GROW; BEGIN context_state; - \n unexpected_newline (token_start, "'"); BEGIN context_state; <<EOF>> unexpected_eof (token_start, "'"); BEGIN context_state; } <SC_STRING> { "\"" STRING_GROW; BEGIN context_state; - \n unexpected_newline (token_start, "\""); BEGIN context_state; <<EOF>> unexpected_eof (token_start, "\""); BEGIN context_state; } @@ -586,7 +584,7 @@ splice (\\[ \f\t\v]*\n)* `-----------------------------------------------------*/ <SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>. | -<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>\n STRING_GROW; +<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER>\n STRING_GROW; %% 2008-11-08 Di-an Jan <[EMAIL PROTECTED]> Implement the FIXME that ends an user action with a semicolon if it seems necessary. * src/scan-code.l (flex rules section): Flag cpp directive from any `#' to the first unescaped end-of-line. Semicolon is not needed after `;', `{', '}', and is needed after any other token (whitespaces, comments, and cpp directives have no effect). * tests/actions.at (Fix user actions without a trailing semicolon): New tests with and without --yacc. * tests/input.at (AT_CHECK_UNUSED_VALUES): Add semicolons to to make user actions complete statements. Adjust column numbers in error messages. * tests/regression.at (Fix user actions without a trailing semicolon): Remove. Covered by new test. diff --git a/src/scan-code.l b/src/scan-code.l index 71c9076..b4b0fd7 100644 --- a/src/scan-code.l +++ b/src/scan-code.l @@ -81,6 +81,18 @@ splice (\\[ \f\t\v]*\n)* /* Nesting level of the current code in braces. */ int braces_level = 0; + /* Whether a semicolon is probably needed. + The heuristic is that a semicolon is not needed after `{', `}' or `;' + and that whitespaces, comments, and C preprocessor directives + does not affect this flag. + Note that `{' does not need a semicolon because of `{}'. */ + bool need_semicolon = false; + + /* Whether in a C preprocessor directive. Don't use a start condition + for this because, at the end of strings and comments, we still need + to know whether we're in a directive. */ + bool in_cpp = false; + /* This scanner is special: it is invoked only once, henceforth is expected to return only once. This initialization is therefore done once per action to translate. */ @@ -135,10 +147,12 @@ splice (\\[ \f\t\v]*\n)* "'" { STRING_GROW; BEGIN SC_CHARACTER; + if (! in_cpp) need_semicolon = true; } "\"" { STRING_GROW; BEGIN SC_STRING; + if (! in_cpp) need_semicolon = true; } "/"{splice}"*" { STRING_GROW; @@ -154,43 +168,67 @@ splice (\\[ \f\t\v]*\n)* { "$"("<"{tag}">")?(-?[0-9]+|"$") { handle_action_dollar (self->rule, yytext, *loc); + if (! in_cpp) need_semicolon = true; } "@"(-?[0-9]+|"$") { handle_action_at (self->rule, yytext, *loc); + if (! in_cpp) need_semicolon = true; } "$" { warn_at (*loc, _("stray `$'")); obstack_sgrow (&obstack_for_string, "$]["); + if (! in_cpp) need_semicolon = true; } "@" { warn_at (*loc, _("stray `@'")); obstack_sgrow (&obstack_for_string, "@@"); + if (! in_cpp) need_semicolon = true; + } + "[" { + obstack_sgrow (&obstack_for_string, "@{"); + if (! in_cpp) need_semicolon = true; + } + "]" { + obstack_sgrow (&obstack_for_string, "@}"); + if (! in_cpp) need_semicolon = true; } - "{" STRING_GROW; ++braces_level; + ";" STRING_GROW; if (! in_cpp) need_semicolon = false; + "{" STRING_GROW; ++braces_level; if (! in_cpp) need_semicolon = false; "}" { bool outer_brace = --braces_level == 0; /* As an undocumented Bison extension, append `;' before the last brace in braced code, so that the user code can omit trailing `;'. But do not append `;' if emulating Yacc, since Yacc does - not append one. - - FIXME: Bison should warn if a semicolon seems to be necessary - here, and should omit the semicolon if it seems unnecessary - (e.g., after ';', '{', or '}', each followed by comments or - white space). Such a warning shouldn't depend on --yacc; it - should depend on a new --pedantic option, which would cause - Bison to warn if it detects an extension to POSIX. --pedantic - should also diagnose other Bison extensions like %yacc. - Perhaps there should also be a GCC-style --pedantic-errors - option, so that such warnings are diagnosed as errors. */ - if (outer_brace && ! yacc_flag) - obstack_1grow (&obstack_for_string, ';'); + not append one. */ + if (outer_brace && need_semicolon) + { + warn_at (*loc, _("a `;' might be needed at the end of action code")); + if (! yacc_flag) + { + if (in_cpp) + obstack_sgrow (&obstack_for_string, "\n;"); + else + obstack_1grow (&obstack_for_string, ';'); + } + } STRING_GROW; + if (! in_cpp) need_semicolon = false; } + + /* Preprocessing directives should only be recognized at the beginning + of lines, allowing whitespace including comments, but in C/C++, + `#' can only be the start of preprocessor directives or within + `#define' directives anyway, so don't bother with begin of line. */ + "#" STRING_GROW; in_cpp = true; + + {splice} STRING_GROW; + [\n\r] STRING_GROW; in_cpp = false; + [ \t\f] STRING_GROW; + . STRING_GROW; if (! in_cpp) need_semicolon = true; } <SC_SYMBOL_ACTION> diff --git a/tests/actions.at b/tests/actions.at index 602eac8..8ae9a4f 100644 --- a/tests/actions.at +++ b/tests/actions.at @@ -1304,3 +1304,125 @@ AT_CLEANUP]) AT_CHECK_ACTION_LOCATIONS([[%initial-action]]) AT_CHECK_ACTION_LOCATIONS([[%destructor]]) AT_CHECK_ACTION_LOCATIONS([[%printer]]) + + +## ----------------------------------------------- ## +## Fix user actions without a trailing semicolon. ## +## ----------------------------------------------- ## + +AT_SETUP([[Fix user actions without a trailing semicolon]]) + +# This feature is undocumented, but we accidentally broke it in 2.3a, +# and there was a complaint at: +# <http://lists.gnu.org/archive/html/bug-bison/2008-11/msg00001.html>. + +AT_DATA([input.y], +[[%% +start: test2 test1 test0 testc; + +test2 +: 'a' { semi; /* TEST:N:2 */ } +| 'b' { if (0) {no_semi} /* TEST:N:2 */ } +| 'c' { if (0) {semi;} /* TEST:N:2 */ } +| 'd' { semi; no_semi /* TEST:Y:2 */ } +| 'e' { semi(); no_semi() /* TEST:Y:2 */ } +| 'f' { semi[]; no_semi[] /* TEST:Y:2 */ } +| 'g' { semi++; no_semi++ /* TEST:Y:2 */ } +| 'h' { {no_semi} no_semi /* TEST:Y:2 */ } +| 'i' { {semi;} no_semi /* TEST:Y:2 */ } +; +test1 + : 'a' { semi; // TEST:N:1 ; +} | 'b' { if (0) {no_semi} // TEST:N:1 ; +} | 'c' { if (0) {semi;} // TEST:N:1 ; +} | 'd' { semi; no_semi // TEST:Y:1 ; +} | 'e' { semi(); no_semi() // TEST:Y:1 ; +} | 'f' { semi[]; no_semi[] // TEST:Y:1 ; +} | 'g' { semi++; no_semi++ // TEST:Y:1 ; +} | 'h' { {no_semi} no_semi // TEST:Y:1 ; +} | 'i' { {semi;} no_semi // TEST:Y:1 ; +} ; +test0 + : 'a' { semi; // TEST:N:1 {} +} | 'b' { if (0) {no_semi} // TEST:N:1 {} +} | 'c' { if (0) {semi;} // TEST:N:1 {} +} | 'd' { semi; no_semi // TEST:Y:1 {} +} | 'e' { semi(); no_semi() // TEST:Y:1 {} +} | 'f' { semi[]; no_semi[] // TEST:Y:1 {} +} | 'g' { semi++; no_semi++ // TEST:Y:1 {} +} | 'h' { {no_semi} no_semi // TEST:Y:1 {} +} | 'i' { {semi;} no_semi // TEST:Y:1 {} +} ; + +testc +: 'a' { +#define TEST_MACRO_N \ +[]"broken\" $ @ $$ @$ []; +string;"} +| 'b' { +no_semi +#define TEST_MACRO_Y \ +[]"broken\" $ @ $$ @$ []; +string;"} +]]) + +m4_define([AT_CHECK_ACTION_MISSING_SEMICOLON_WARNINGS], +[[input.y:8.48: warning: a `;' might be needed at the end of action code +input.y:9.48: warning: a `;' might be needed at the end of action code +input.y:10.48: warning: a `;' might be needed at the end of action code +input.y:11.48: warning: a `;' might be needed at the end of action code +input.y:12.48: warning: a `;' might be needed at the end of action code +input.y:13.48: warning: a `;' might be needed at the end of action code +input.y:20.1: warning: a `;' might be needed at the end of action code +input.y:21.1: warning: a `;' might be needed at the end of action code +input.y:22.1: warning: a `;' might be needed at the end of action code +input.y:23.1: warning: a `;' might be needed at the end of action code +input.y:24.1: warning: a `;' might be needed at the end of action code +input.y:25.1: warning: a `;' might be needed at the end of action code +input.y:31.1: warning: a `;' might be needed at the end of action code +input.y:32.1: warning: a `;' might be needed at the end of action code +input.y:33.1: warning: a `;' might be needed at the end of action code +input.y:34.1: warning: a `;' might be needed at the end of action code +input.y:35.1: warning: a `;' might be needed at the end of action code +input.y:36.1: warning: a `;' might be needed at the end of action code +input.y:47.9: warning: a `;' might be needed at the end of action code +]]) + +AT_BISON_CHECK([[-o input.c input.y]], [0], [], + AT_CHECK_ACTION_MISSING_SEMICOLON_WARNINGS) +AT_CHECK([[grep -c '/\* TEST:N:2 \*/ }$' input.c]], [0], [[3 +]]) +AT_CHECK([[grep -c '/\* TEST:Y:2 \*/ ;}$' input.c]], [0], [[6 +]]) +AT_CHECK([[sed -n '/TEST:N:1/{N +s/\n/<NL>/gp}' input.c | grep -c '// TEST:N:1 [;{}]*<NL>}$']], [0], [[6 +]]) +AT_CHECK([[sed -n '/TEST:Y:1/{N +s/\n/<NL>/gp}' input.c | grep -c '// TEST:Y:1 [;{}]*<NL>;}$']], [0], [[12 +]]) +AT_CHECK([[sed -n '/^#define TEST_MACRO_N/{N +N +s/\n/<NL>/gp}' input.c | grep -F -xc '#define TEST_MACRO_N \<NL>[]"broken\" $ @ $$ @$ [];<NL>string;"}']], + [0], [[1 +]]) +AT_CHECK([[sed -n '/^#define TEST_MACRO_Y/{N +N +N +s/\n/<NL>/gp}' input.c | grep -F -xc '#define TEST_MACRO_Y \<NL>[]"broken\" $ @ $$ @$ [];<NL>string;"<NL>;}']], + [0], [[1 +]]) + +AT_BISON_CHECK([[--yacc -o input.c input.y]], [0], [], + AT_CHECK_ACTION_MISSING_SEMICOLON_WARNINGS) +AT_CHECK([[grep -c '/\* TEST:[NY]:2 \*/ }$' input.c]], [[0]], [[9 +]]) +AT_CHECK([[sed -n '/TEST:[NY]:1/{N +s/\n/<NL>/gp}' input.c | grep -c '// TEST:[NY]:1 [;{}]*<NL>}$']], [[0]], [[18 +]]) +AT_CHECK([[sed -n '/^#define TEST_MACRO_[NY]/{N +N +s/\n/<NL>/gp}' input.c | grep -xc '#define TEST_MACRO_[NY] \\<NL>\[\]"broken\\" \$ @ \$\$ @\$ \[\];<NL>string;"}']], + [0], [[2 +]]) + +AT_CLEANUP diff --git a/tests/input.at b/tests/input.at index 8bf61fa..7da3581 100644 --- a/tests/input.at +++ b/tests/input.at @@ -97,16 +97,16 @@ m4_ifval($1, [ ], [_AT_UNUSED_VALUES_DECLARATIONS ])[[%% start: - 'a' a { $]2[ } | 'b' b { $]2[ } | 'c' c { $]2[ } | 'd' d { $]2[ } | 'e' e { $]2[ } -| 'f' f { $]2[ } | 'g' g { $]2[ } | 'h' h { $]2[ } | 'i' i { $]2[ } | 'j' j { $]2[ } -| 'k' k { $]2[ } | 'l' l { $]2[ } + 'a' a { $]2[; } | 'b' b { $]2[; } | 'c' c { $]2[; } | 'd' d { $]2[; } +| 'e' e { $]2[; } | 'f' f { $]2[; } | 'g' g { $]2[; } | 'h' h { $]2[; } +| 'i' i { $]2[; } | 'j' j { $]2[; } | 'k' k { $]2[; } | 'l' l { $]2[; } ; a: INT | INT { } INT { } INT { }; b: INT | /* empty */; -c: INT | INT { $]1[ } INT { $<integer>2 } INT { $<integer>4 }; -d: INT | INT { } INT { $]1[ } INT { $<integer>2 }; -e: INT | INT { } INT { } INT { $]1[ }; +c: INT | INT { $]1[; } INT { $<integer>2; } INT { $<integer>4; }; +d: INT | INT { } INT { $]1[; } INT { $<integer>2; }; +e: INT | INT { } INT { } INT { $]1[; }; f: INT | INT { } INT { } INT { $]$[ = $]1[ + $]3[ + $]5[; }; g: INT | INT { $<integer>$; } INT { $<integer>$; } INT { }; h: INT | INT { $<integer>$; } INT { $<integer>$ = $<integer>2; } INT { }; @@ -123,18 +123,18 @@ input.y:11.10-32: warning: unused value: $]1[ input.y:11.10-32: warning: unused value: $]3[ input.y:11.10-32: warning: unused value: $]5[ input.y:12.9: warning: empty rule for typed nonterminal, and no action -]]m4_ifval($2, [[[input.y:13.14-19: warning: unset value: $$ -input.y:13.25-39: warning: unset value: $$ -]]])[[input.y:13.10-59: warning: unset value: $]$[ -input.y:13.10-59: warning: unused value: $]3[ -input.y:13.10-59: warning: unused value: $]5[ +]]m4_ifval($2, [[[input.y:13.14-20: warning: unset value: $$ +input.y:13.26-41: warning: unset value: $$ +]]])[[input.y:13.10-62: warning: unset value: $]$[ +input.y:13.10-62: warning: unused value: $]3[ +input.y:13.10-62: warning: unused value: $]5[ ]]m4_ifval($2, [[[input.y:14.14-16: warning: unset value: $$ -]]])[[input.y:14.10-47: warning: unset value: $]$[ -input.y:14.10-47: warning: unused value: $]3[ -input.y:14.10-47: warning: unused value: $]5[ -input.y:15.10-36: warning: unset value: $]$[ -input.y:15.10-36: warning: unused value: $]3[ -input.y:15.10-36: warning: unused value: $]5[ +]]])[[input.y:14.10-49: warning: unset value: $]$[ +input.y:14.10-49: warning: unused value: $]3[ +input.y:14.10-49: warning: unused value: $]5[ +input.y:15.10-37: warning: unset value: $]$[ +input.y:15.10-37: warning: unused value: $]3[ +input.y:15.10-37: warning: unused value: $]5[ input.y:17.10-58: warning: unset value: $]$[ input.y:17.10-58: warning: unused value: $]1[ ]]m4_ifval($2, [[[input.y:17.10-58: warning: unused value: $]2[ diff --git a/tests/regression.at b/tests/regression.at index 6bfc8d0..9a27749 100644 --- a/tests/regression.at +++ b/tests/regression.at @@ -1255,27 +1255,3 @@ AT_COMPILE([[input]]) AT_PARSER_CHECK([[./input]]) AT_CLEANUP - - - -## ----------------------------------------------- ## -## Fix user actions without a trailing semicolon. ## -## ----------------------------------------------- ## - -AT_SETUP([[Fix user actions without a trailing semicolon]]) - -# This feature is undocumented, but we accidentally broke it in 2.3a, and there -# was a complaint at: -# <http://lists.gnu.org/archive/html/bug-bison/2008-11/msg00001.html>. - -AT_DATA([input.y], -[[%% -start: {asdffdsa} ; -]]) - -AT_BISON_CHECK([[-o input.c input.y]]) -AT_CHECK([[sed -n '/asdffdsa/s/^ *//p' input.c]], [[0]], -[[{asdffdsa;} -]]) - -AT_CLEANUP
