Hi Vincent, > Le 28 juin 2020 à 21:30, Vincent Imbimbo <[email protected]> a écrit : > > Hey Akim, > I think I found a good place for the cex documentation in bison.texi, but it > might still be a bit awkward. Tell me if you have any ideas for changes.
Thanks a lot for this! I made a few changes (e.g., the name is -Wcounterexamples, plural, moving -Wcex as foremost feature of 3.7, wrap at 76 columns, remove the duplicate example in NEWS which is B&W, sort references, etc.) Here's how I installed your commit. I'll proceed from there. Thanks! commit ecd731a84e8b75937bb99d5a6d4f20cb9fecf4b3 Author: Vincent Imbimbo <[email protected]> Date: Sun Jun 28 15:30:17 2020 -0400 doc: cex documentation * NEWS, doc/bison.texi: Add documentation for conflict counterexample generation. diff --git a/NEWS b/NEWS index 9e1c5076..b3a5d1ea 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,30 @@ GNU Bison NEWS ** New features +*** Counterexample Generation + + When given `--report=counterexamples` or `-Wcounterexamples`, bison will + now output counterexamples for conflicts in the grammar. These are + strings in the grammar which can be parsed in two ways due to the + conflict. For example: + + Example exp '+' exp • '/' exp + First derivation exp ::=[ exp ::=[ exp '+' exp • ] '/' exp ] + Second derivation exp ::=[ exp '+' exp ::=[ exp • '/' exp ] ] + + This is a shift/reduce conflict caused by none of the operators having + precedence, so the example can be parsed in the two ways shown. When + bison cannot find an example that can be derived in two ways, it instead + generates two examples that are the same up until the dot: + + First example expr • ID $end + First derivation $accept ::=[ s ::=[ a ::=[ expr • ] ID ] $end ] + Second example expr • ID ',' ID $end + Second derivation $accept ::=[ s ::=[ a ::=[ expr ::=[ expr • ID ',' ] ] ID ] $end ] + + In these cases, the parser usually doesn't have enough lookahead to + differentiate the two given examples. + *** File prefix mapping Bison learned a new argument, `--file-prefix-map OLD=NEW`. Any file path diff --git a/doc/bison.texi b/doc/bison.texi index 320a645b..e7225306 100644 --- a/doc/bison.texi +++ b/doc/bison.texi @@ -9797,12 +9797,69 @@ calc.y:19.1-7: @dwarning{warning}: nonterminal useless in grammar: useless [@dwa 19 | @dwarning{useless: STR;} | @dwarning{^~~~~~~} calc.y: @dwarning{warning}: 7 shift/reduce conflicts [@dwarning{-Wconflicts-sr}] +calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate conflict counterexamples [@dwarning{-Wother}] @end example -When given @option{--report=state}, in addition to @file{calc.tab.c}, it -creates a file @file{calc.output} with contents detailed below. The -order of the output and the exact presentation might vary, but the -interpretation is the same. +When given @option{-Wcounterexamples}, @command{bison} will run a search for +strings in your grammar that better demonstrate you +conflicts. Counterexample generation was initially developed by Chinawat +Isradisaikul and Andrew Myers (@pxref{Bibliography}). For @file{calc.y}, +the first printed example is: + +@example +Shift/reduce conflict on token '/': + Example exp '+' exp • '/' exp + First derivation exp ::=[ exp ::=[ exp '+' exp • ] '/' exp ] + Example exp '+' exp • '/' exp + Second derivation exp ::=[ exp '+' exp ::=[ exp • '/' exp ] ] +@end example + +This shows two separate derivations in the grammar for the same @code{exp}: +@samp{e1 + e2 / e3}. The derivations show how your rules would parse the +given example. Here, the first derivation completes a reduction when seeing +@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second +derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as +an @code{exp}. Therefore, it is easy to see that adding @code{%precedence} +directives would fix this conflict. + +Sometimes, the search will not find an example that can be derived in two +ways. In these cases, counterexample generation will provide two examples +that are the same up until the dot. Most notably, this will happen when +your grammar requires a stronger parser (more lookahead, LR instead of +LALR). The following example isn't LR(1): + +@example +%token ID +%% +s: a ID +a: expr +expr: %empty | expr ID ',' +@end example + +@command{bison} reports: + +@example +Shift/reduce conflict on token ID: + First example expr • ID $end + First derivation $accept ::=[ s ::=[ a ::=[ expr • ] ID ] $end ] + Second example expr • ID ',' ID $end + Second derivation $accept ::=[ s ::=[ a ::=[ expr ::=[ expr • ID ',' ] ] ID ] $end ] +@end example + +This conflict is caused by the parser not having enough information to know +the difference between these two examples. The parser would need an +additional lookahead token to know whether or not a comma follows the +@code{ID} after @code{expr}. These types of conflicts tend to be more +difficult to fix, and usually need a rework of the grammar. In this case, +it can be fixed by changing around the recursion: @code{expr: ID | ',' expr +ID}. + +Counterexamples can also be written to a file with @option{--report=cex}. + +Going back to the calc example, when given @option{--report=state}, +in addition to @file{calc.tab.c}, it creates a file @file{calc.output} +with contents detailed below. The order of the output and the exact +presentation might vary, but the interpretation is the same. @noindent @cindex token, useless @@ -15211,6 +15268,7 @@ resolution. @xref{Unreachable States}. @unnumbered Bibliography @c Please follow the following canvas to add more references. +@c And keep sorted alphabetically. @table @asis @item [Corbett 1984] @@ -15252,6 +15310,14 @@ Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@: 615--649. @uref{http://dx.doi.org/10.1145/69622.357187} +@item [Isradisaikul, Myers 2015] +Chinawat Isradisaikul, Andrew Myers, +Finding Counterexamples from Parsing Conflicts, +in @cite{Proceedings of the 36th ACM SIGPLAN Conference on +Programming Language Design and Implementation} (PLDI '15), +ACM, pp.@: 555--564. +@uref{https://www.cs.cornell.edu/andru/papers/cupex/cupex.pdf} + @item [Johnson 1978] Steven C. Johnson, A portable compiler: theory and practice, @@ -15345,17 +15411,18 @@ London, Department of Computer Science, TR-00-12 (December 2000). @c LocalWords: typename emplace Wconversion Wshorten yacchack reentrancy ou @c LocalWords: Relocatability exprs fixit Wyacc parseable fixits ffixit svg @c LocalWords: DNDEBUG cstring Wzero workalike POPL workalikes byacc UCB -@c LocalWords: Penello's Penello Byson Byson's Corbett's CSD TOPLAS PDP +@c LocalWords: Penello's Penello Byson Byson's Corbett's CSD TOPLAS PDP cex @c LocalWords: Beazley's goyacc ocamlyacc SIGACT SIGPLAN colorWarning exVal @c LocalWords: setcolor rgbError colorError rgbNotice colorNotice derror @c LocalWords: colorOff maincolor inlineraw darkviolet darkcyan dwarning @c LocalWords: dnotice copyable stdint ptrdiff bufsize yyreport invariants -@c LocalWords: xrefautomaticsectiontitle yysyntax yysymbol ARGMAX cond +@c LocalWords: xrefautomaticsectiontitle yysyntax yysymbol ARGMAX cond RTTI @c LocalWords: Wdangling yytoken erreur syntaxe inattendu attendait nombre @c LocalWords: YYUNDEF SymbolKind yypcontext YYENOMEM TOKENMAX getBundle -@c LocalWords: ResourceBundle myResources getString getName getToken +@c LocalWords: ResourceBundle myResources getString getName getToken ylwrap @c LocalWords: getLocation getExpectedTokens reportSyntaxError bistromathic -@c LocalWords: TokenKind +@c LocalWords: TokenKind Automake's rtti Wcounterexamples Chinawat PLDI +@c LocalWords: Isradisaikul @c Local Variables: @c ispell-dictionary: "american"
