Hu Ryan, > Le 7 mars 2021 à 01:33, Ryan <[email protected]> a écrit : > > I've noticed a strange behavior with my push parser on bison 3.7+, where > when I multiple identical lines through it, the first one parses > correctly, but all future lines parse the first token of the line twice. > > I'm not sure if this is a bug in bison or if there's something wrong > with my push loop, but it definitely worked as I expect it to as of > 3.6.4. > > Code to reproduce and the output of that code is below.
Thanks a lot for the analysis. I was able to spot when the regression was introduced, which is, unsurprisingly (https://git.savannah.gnu.org/cgit/bison.git/commit/?id=330552ea499ca474f65967160e9d4e50265f9631): commit 330552ea499ca474f65967160e9d4e50265f9631 Author: Akim Demaille <[email protected]> Date: Sun Jun 28 16:09:13 2020 +0200 yacc.c: push: don't clear the parser state when accepting/rejecting Currently when a push parser finishes its parsing (i.e., it did not return YYPUSH_MORE), it also clears its state. It is therefore impossible to see if it had parse errors. In the context of autocompletion, because error recovery might have fired, the parser is actually already in a different state. For instance on `(1 + + <TAB>` in the bistromathic, because there's a `exp: "(" error ")"` recovery rule, `1 + +` tokens have already been popped, replaced by `error`, and autocompletions think we are ready for the closing ")". So here, we would like to see if there was a syntax error, yet `yynerrs` was cleared. In the case of a successful parse, we still have a problem: if error recovery succeeded, we won't know it, since, again, `yynerrs` is clearer. It seems much more natural to leave the parser state available for analysis when there is a failure. To reuse the parser, we should either: 1. provide an explicit means to reinitialize a parser state for future parses. 2. automatically reset the parser state when it is used in a new parse. Option 2 requires to check whether we need to reinitialize the parser each time we call `yypush_parse`, i.e., each time we give a new token. This seems expensive compared to Option 1, but benchmarks revealed no difference. Option 1 is incompatible with the documentation ("After `yypush_parse` returns a status other than `YYPUSH_MORE`, the parser instance `yyps` may be reused for a new parse."). So Option 2 wins, reusing the private `yynew` member to record that a parse was finished, and therefore that the state must reset in the next call to `yypull_parse`. While at it, this implementation now reuses the previously enlarged stacks from one parse to another. * data/skeletons/yacc.c (yypstate_new): Set up the stacks in their initial configurations (setting their bottom to the stack array), and use yypstate_clear to reset them (moving their top to their bottom). (yypstate_delete): Adjust. (yypush_parse): At the beginning, clear yypstate if needed, and at the end, record when yypstate needs to be clearer. * examples/c/bistromathic/parse.y (expected_tokens): Do not propose autocompletion when there are parse errors. * examples/c/bistromathic/bistromathic.test: Check that case. Reading this commit again, I don't understand why I had changed the handling of yynew in yypush_parse this way. The following patch does it better, IMHO. I will push this into the maint(enance) branch, and will very soon release 3.7.6 with this fix. Could you please confirm that the Bison in this tarball does address your issue? It is not 3.7.6, it is the current development branch (to become 3.8), but it should still be relevant to check your issue. https://www.lrde.epita.fr/~akim/private/bison/bison-3.7.5.320-7fb5c.tar.gz https://www.lrde.epita.fr/~akim/private/bison/bison-3.7.5.320-7fb5c.tar.lz https://www.lrde.epita.fr/~akim/private/bison/bison-3.7.5.320-7fb5c.tar.xz Thanks a lot! Akim commit 7fb5cf6790fb0e58917892867742cc29c4263384 Author: Akim Demaille <[email protected]> Date: Sun Mar 7 10:01:53 2021 +0100 yacc: fix push parser When a pstate is used for multiple successive parses, some state may leak from a previous run into the following one. That was introduced in 330552ea499ca474f65967160e9d4e50265f9631 "yacc.c: push: don't clear the parser state when accepting/rejecting". Reported by Ryan <[email protected]> https://lists.gnu.org/r/bug-bison/2021-03/msg00000.html * data/skeletons/yacc.c (yypush_parse): We reusing a pstate from a previous run, do behave as if it were the first run. * tests/push.at (Pstate reuse): Check this. diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c index a59e6aef3..98671322d 100644 --- a/data/skeletons/yacc.c +++ b/data/skeletons/yacc.c @@ -1650,14 +1650,16 @@ yyparse (]m4_ifset([b4_parse_param], [b4_formals(b4_parse_param)], [void])[)]])] switch (yyps->yynew) { - case 2: - yypstate_clear (yyps); - goto case_0; - - case_0: case 0: yyn = yypact[yystate]; goto yyread_pushed_token; + + case 2: + yypstate_clear (yyps); + break; + + default: + break; }]])[ YYDPRINTF ((stderr, "Starting parse\n")); diff --git a/tests/push.at b/tests/push.at index cd3e113ee..f4a259bcd 100644 --- a/tests/push.at +++ b/tests/push.at @@ -25,7 +25,7 @@ AT_BANNER([[Push Parsing Tests]]) AT_SETUP([[Memory Leak for Early Deletion]]) # Requires Valgrind. -AT_BISON_OPTION_PUSHDEFS +AT_BISON_OPTION_PUSHDEFS([%define api.push-pull push]) AT_DATA_GRAMMAR([[input.y]], [[ %{ @@ -144,8 +144,8 @@ AT_CLEANUP AT_SETUP([[Unsupported Skeletons]]) -AT_BISON_OPTION_PUSHDEFS -AT_DATA([[input.y]], +AT_BISON_OPTION_PUSHDEFS([%define api.push-pull push]) +AT_DATA_GRAMMAR([[input.y]], [[%glr-parser %define api.push-pull push %% @@ -158,3 +158,125 @@ AT_BISON_CHECK([[input.y]], [[1]], [], ]]) AT_CLEANUP + + +## -------------- ## +## Pstate reuse. ## +## -------------- ## + +AT_SETUP([[Pstate reuse]]) + +# Make sure that when a single pstate is used for multiple successive +# parses, no state from a previous run leaks into the following one. +# +# See https://lists.gnu.org/r/bug-bison/2021-03/msg00000.html. + +AT_BISON_OPTION_PUSHDEFS([%define api.push-pull push]) +AT_DATA_GRAMMAR([[input.y]], +[[%code top { + #include <stdlib.h> + #include <string.h> + + static char *string_concat (char *a, char *b); + ]AT_YYERROR_DECLARE[ +} + +%define parse.trace +%define api.pure full +%define api.push-pull push +%expect 0 + +%union { + char *sval; +}; +%destructor { free ($$); } <sval> +%printer { fprintf (yyo, "%s", $$); } <sval> + +%token <sval> RAW +%token EOL + +%type <sval> text + +%% + +line + : text EOL { printf ("text: %s\n", $1); free ($1); YYACCEPT; }; + +text + : RAW { $$ = $1; } + | text RAW { $$ = string_concat ($1, $2); } + ; + +%% +]AT_YYERROR_DEFINE[ + +static char * +string_concat (char *a, char *b) +{ + size_t la = strlen (a); + size_t lb = strlen (b); + char *res = malloc (la + lb + 1); + strcpy (res, a); + strcpy (res + la, b); + free (a); + free (b); + return res; +} + +static int +push (void *yyps, yytoken_kind_t kind, const char *str) +{ + YYSTYPE lval; + lval.sval = str ? strdup (str) : NULL; + switch (yypush_parse (yyps, kind, &lval)) + { + case 0: + return 0; + case YYPUSH_MORE: + // parsing incomplete, but valid; parser not reset + return 0; + case 1: + // YYABORT or syntax invalid; parser is reset + fprintf (stderr, "invalid input, but no error was thrown\n"); + return 1; + case 2: + // memory exhaustion; parser is reset + fprintf (stderr, "memory exhaustion during yypush_parse\n"); + return 1; + } + return 1; +} + +int main(void) +{ + yydebug = !!getenv ("YYDEBUG"); + void *yyps = yypstate_new (); + +#define PUSH(Kind, Val) \ + do { \ + if (push (yyps, Kind, Val)) \ + return 1; \ + } while (0) + + PUSH (RAW, "te"); + PUSH (RAW, "xt"); + PUSH (EOL, NULL); + + PUSH (RAW, "te"); + PUSH (RAW, "xt"); + PUSH (EOL, NULL); + + yypstate_delete (yyps); + + return 0; +} +]]) + +AT_FULL_COMPILE([input]) +AT_CHECK([./input], 0, +[[text: text +text: text +]]) + +AT_BISON_OPTION_POPDEFS +AT_CLEANUP
