Hi all, This series of patches addresses two related shortcomings: currently we destroy non-ASCII token strings (which ruins Hans' use of mathematical symbols for instance), and we don't provide a means to translate the token names in error messages.
See https://lists.gnu.org/archive/html/bison-patches/2018-11/msg00030.html. Paul, I have completely removed your work that quoted the token names in tname. In retrospect, I don't think we should have done that. This way it becomes straightforward to translate these strings, as shown by the "translate bison's own tokens" change. Bison's grammar becomes: %token GRAM_EOF 0 _("end of file") STRING _("string") TSTRING _("translatable string") and then I have: $ cat /tmp/wrong.y %token 12 %% exp: $ LC_ALL=C ./_build/8d/tests/bison /tmp/wrong.y /tmp/wrong.y:1.8-9: error: syntax error, unexpected integer literal, expecting character literal or identifier or <tag> %token 12 ^^ $ ./_build/8d/tests/bison /tmp/wrong.y /tmp/wrong.y:1.8-9: erreur: erreur de syntaxe, littéral entier inattendu, attendait caractère littéral ou identifiant ou <tag> %token 12 ^^ What I did also changes the signature of yytnamerr, which you made overridable by the user using an ifndef yytnamerr. This customization point, yytnamerr, was not documented. I think that if we simply change the name, things will continue to compile, but some error messages, if people tweaked yytnamerr, will change maybe unexpectedly. I also removed the support for trigraphs. Again, I would claim that that's the user's problem. However, I do have broken a documented contract: the documentation clearly specifies how tokens are stored in yytname: -- Directive: %token-table Generate an array of token names in the parser implementation file. The name of the array is ‘yytname’; ‘yytname[I]’ is the name of the token whose internal Bison token code number is I. The first three elements of ‘yytname’ correspond to the predefined tokens ‘"$end"’, ‘"error"’, and ‘"$undefined"’; after these come the symbols defined in the grammar file. The name in the table includes all the characters needed to represent the token in Bison. For single-character literals and literal strings, this includes the surrounding quoting characters and any escape sequences. For example, the Bison single-character literal ‘'+'’ corresponds to a three-character name, represented in C as ‘"'+'"’; and the Bison two-character literal string ‘"\\/"’ corresponds to a five-character name, represented in C as ‘"\"\\\\/\""’. I don't understand well what people can do from this table. In particular, it is not easily helpful to directly generate scanner rules, since the connection with the external token number (the one returned by yylex) is not trivial and is not documented. Rici, you might have some relevant input on this issue. If that's really a problem, we can generate two tables: one for backward compatibility (deprecated?), and the new one for error messages. This series of patch is a starting point to discuss alternatives. Nothing is cast in stone here. I would really like to address this in 3.3, which I expect to release within a couple of months at most. This feature was the last one expected in 3.3. This is currently on gnu.org in the token-i18n branch, and available in these tarballs. https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.153-3e2f3.tar.gz https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.153-3e2f3.tar.xz Cheers! Akim Demaille (8): yacc.c: avoid negated if parsers: revamp the interface of yytnamerr tests: no longer play with trigraphs parsers: don't double escape tnames parsers: support translatable token aliases tests: check that internationalization of token works translate bison's own tokens regen data/skeletons/glr.c | 90 +-- data/skeletons/lalr1.cc | 56 +- data/skeletons/lalr1.d | 38 +- data/skeletons/lalr1.java | 41 +- data/skeletons/yacc.c | 75 +- src/output.c | 33 +- src/parse-gram.c | 1358 +++++++++++++++++++------------------ src/parse-gram.h | 141 ++-- src/parse-gram.y | 96 +-- src/scan-gram.l | 25 +- src/symtab.c | 3 +- src/symtab.h | 7 +- tests/calc.at | 21 +- tests/input.at | 10 +- tests/javapush.at | 64 +- tests/local.at | 5 +- tests/regression.at | 38 +- 17 files changed, 1019 insertions(+), 1082 deletions(-) -- 2.20.0
