On Tue, Dec 8, 2020 at 8:37 AM Akim Demaille <[email protected]> wrote:
> Hi Adela, > > > Le 7 déc. 2020 à 15:56, Adela Vais <[email protected]> a écrit : > > > > Hello, > > > > I have a question about Bison's (not the users') error messages, should > > they be internationalized by default? That was my approach. > > Well, that depends what you mean "by default". Yes, you don't have > to do anything for YY_ to be used in C for instance: bison's messages > are all wrapped into YY_ by default. But it's up to the user to set > Gettext up. So the user is not _forced_ to use internationalization. > That's "thanks" to macros: > > #ifndef YY_ > # if defined YYENABLE_NLS && YYENABLE_NLS > # if ENABLE_NLS > # include <libintl.h> /* INFRINGES ON USER NAME SPACE */ > # define YY_(Msgid) dgettext ("bison-runtime", Msgid) > # endif > # endif > # ifndef YY_ > # define YY_(Msgid) Msgid > # endif > #endif > ]b4_has_translations_if([ > #ifndef N_ > # define N_(Msgid) Msgid > #endif > ])[ > > So if internationalization is not available at compile time (not > bison time), then YY_ is a noop. > > D users should not be forced to use gettext. Would this be a place > where D's introspection be an answer? This can definitely be done with introspection. I wrote the gist of it below: ``` static if (!is(typeof(YY_))) { version(YYENABLE_NLS) { version(ENABLE_NLS) { alias YY_ = partial!(dgettext, "bison-runtime"); } } static if (!is(typeof(YY_))) { pragma(inline, true) string YY_(string msg) { return msg; } } } static if (!is(typeof(N_))) { pragma(inline, true) string N_(string msg) { return msg; } } ``` To use YYENABLE_NLS and ENABLE_NLS, one would use the `-version` compiler option: `-version=YYENABLE_NLS -version=ENABLE_NLS` > We definitely can add some > Bison directive to enable i18n, if there's no clean way to do that > in the D part only. > > Bare in mind that the test suite must pass, whether gettext is > available or not. > > > > diff --git a/data/skeletons/d.m4 b/data/skeletons/d.m4 > > index b84d4e94..489dce46 100644 > > --- a/data/skeletons/d.m4 > > +++ b/data/skeletons/d.m4 > > @@ -192,7 +192,12 @@ b4_symbol_foreach([b4_token_enum])dnl > > } > > ]) > > > > - > > +# b4_symbol_translate(STRING) > > +# --------------------------- > > +# Used by "bison" in the array of symbol names to mark those that > > +# require translation. > > +m4_define([b4_symbol_translate], > > +[[_($1)]]) > > > > ## -------------- ## > > ## Symbol kinds. ## > > @@ -243,6 +248,12 @@ m4_define([b4_declare_symbol_enum], > > static immutable string[] yytname_ = @{ > > ]b4_tname[ > > @}; > > +]b4_has_translations_if([[ > > + /* YYTRANSLATABLE[SYMBOL-NUM] -- Whether YY_SNAME[SYMBOL-NUM] is > > + internationalizable. */ > > + static immutable ]b4_int_type_for([b4_translatable])[[] > yytranslatable = @{ > > + ]b4_translatable[ > > + @};]])[ > > > > /* Return YYSTR after stripping away unnecessary quotes and > > backslashes, so that it's suitable for yyerror. The heuristic is > > @@ -252,8 +263,31 @@ m4_define([b4_declare_symbol_enum], > > final void toString(W)(W sink) const > > if (isOutputRange!(W, char)) > > { > > - string yystr = yytname_[yycode_]; > > - > > + string yystr = yytname_[yycode_];]b4_has_translations_if([[ > > + if (yystr[0] == '"') > > + { > > + string yyr; > > + strip_quotes: > > + for (int i = 1; i < yystr.length; i++) > > + switch (yystr[i]) > > + { > > + case '\'': > > + case ',': > > + break strip_quotes; > > This looks like yytnamerr, something the old skeletons have to live with > that does not make sense in D. In the other skeletons, this is triggered > with "parse.error verbose" that you should not support. Support only > "parse.error detailed". You should remove all this. > > In C, for instance, i18n is handled in yysymbol_name. > > static const char * > yysymbol_name (yysymbol_kind_t yysymbol) > { > static const char *const yy_sname[] = > { > ]b4_symbol_names[ > };]b4_has_translations_if([[ > /* YYTRANSLATABLE[SYMBOL-NUM] -- Whether YY_SNAME[SYMBOL-NUM] is > internationalizable. */ > static ]b4_int_type_for([b4_translatable])[ yytranslatable[] = > { > ]b4_translatable[ > }; > return (yysymbol < YYNTOKENS && yytranslatable[yysymbol] > ? _(yy_sname[yysymbol]) <========================= > : yy_sname[yysymbol]);]], [[ > return yy_sname[yysymbol];]])[ > } > > > You should really eliminate all this code, even in the other cases. > I suggest that first you change this commit to not add this piece > of code for i18n, and in an later commit, you remove this from the > other cases too. > > > + case '\\': > > + if (yystr[++i] != '\\') > > + break strip_quotes; > > + goto default; > > + default: > > + yyr ~= yystr[i]; > > + break; > > + case '"': > > + yyr = (yycode_ < yyntokens_ && yytranslatable[yycode_] > > 0) > > + ? _(yyr) : yyr; > > + put(sink, yyr); > > + return; > > + } > > + }]],[[ > > if (yystr[0] == '"') > > { > > strip_quotes: > > @@ -275,13 +309,15 @@ m4_define([b4_declare_symbol_enum], > > case '"': > > return; > > } > > - } > > + }]])[ > > else if (yystr == "$end") > > - { > > - put(sink, "end of input"); > > + {]b4_has_translations_if([[ > > + put(sink, _("end of input"));]],[[ > > + put(sink, "end of input");]])[ > > return; > > This is hairy, and is no longer needed at all. Java no longer > uses that either. The translation of $end is provided by bison itself, > like any other token, the skeleton should not play with it at all. > Have a look at the other skeletons. > > > - } > > - > > + }]b4_has_translations_if([[ > > + yystr = (yycode_ < yyntokens_ && yytranslatable[yycode_] > 0) > > + ? _(yystr) : yystr;]])[ > > put(sink, yystr); > > } > > } > > diff --git a/data/skeletons/lalr1.d b/data/skeletons/lalr1.d > > index d126d499..615482d4 100644 > > --- a/data/skeletons/lalr1.d > > +++ b/data/skeletons/lalr1.d > > @@ -40,7 +40,28 @@ version(D_Version2) { > > ]b4_user_post_prologue[ > > ]b4_percent_code_get([[imports]])[ > > import std.format; > > +import std.conv; > > > > +/** > > + * Handle error message internationalisation > > Please end your comments with a period. > > > + */ > > +extern(C) char* dgettext(const char*, const char*); > > +string YY_(const char* s) > > +{ > > + char* res = dgettext("bison-runtime", s); > > + return to!string(res); > > Why this intermediate variable? Also, can't you declare dcgettext > within the function itself? I like scopes to be as small as possible. > > > +} > > +]b4_has_translations_if([ > > +/** > > + * User error message internationalisation > > + */ > > +extern(C) char* gettext(const char*); > > +string _(string s) > > +{ > > + char* res = gettext(s.ptr); > > + return to!string(res); > > Ditto. > > > +} > > +])[ > > /** > > * A Bison parser, automatically generated from > <tt>]m4_bpatsubst(b4_file_name, [^"\(.*\)"$], [\1])[</tt>. > > * > > @@ -704,20 +725,41 @@ m4_popdef([b4_at_dollar])])dnl > > immutable int argmax = 5; > > SymbolKind[] yyarg = new SymbolKind[argmax]; > > int yycount = yysyntaxErrorArguments(yyctx, yyarg, argmax); > > - string res = "syntax error, unexpected "; > > - res ~= format!"%s"(yyarg[0]); > > - if (yycount < argmax + 1) > > + string[] yystr = new string[yycount]; > > + for (int yyi = 0; yyi < yycount; yyi++) > > + yystr[yyi] = format!"%s"(yyarg[yyi]); > > + string res, yyformat; > > + import std.string; > > + switch (yycount) > > { > > - for (int yyi = 1; yyi < yycount; yyi++) > > - { > > - res ~= yyi == 1 ? ", expecting " : " or "; > > - res ~= format!"%s"(SymbolKind(yyarg[yyi])); > > - } > > + case 1: > > + yyformat = YY_("syntax error, unexpected %s"); > > + res = format(yyformat, yystr[0]); > > + break; > > I don't see the point of using format twice: once to convert the symbol > kinds to strings, and then to build the full error message. How about > not using yystr at all? > > > + case 2: > > + yyformat = YY_("syntax error, unexpected %s, expecting %s"); > > + res = format(yyformat, yystr[0], yystr[1]); > > + break; > > + case 3: > > + yyformat = YY_("syntax error, unexpected %s, expecting %s or > %s"); > > + res = format(yyformat, yystr[0], yystr[1], yystr[2]); > > + break; > > + case 4: > > + yyformat = YY_("syntax error, unexpected %s, expecting %s or > %s or %s"); > > + res = format(yyformat, yystr[0], yystr[1], yystr[2], > yystr[3]); > > + break; > > + case 5: > > + yyformat = YY_("syntax error, unexpected %s, expecting %s or > %s or %s or %s"); > > + res = format(yyformat, yystr[0], yystr[1], yystr[2], > yystr[3], yystr[4]); > > + break; > > + default: > > + res = YY_("syntax error"); > > + break; > > } > > yyerror(]b4_locations_if([yyctx.getLocation(), ])[res); > > }]], > > [[simple]], [[ > > - yyerror(]b4_locations_if([yyctx.getLocation(), ])["syntax > error");]])[ > > + yyerror(]b4_locations_if([yyctx.getLocation(), ])[YY_("syntax > error"));]])[ > > } > > > > ]b4_parse_error_bmatch( > > diff --git a/doc/bison.texi b/doc/bison.texi > > index 959a4039..3f3a8b26 100644 > > --- a/doc/bison.texi > > +++ b/doc/bison.texi > > @@ -13945,6 +13945,55 @@ or nonzero, full tracing. > > Identify the Bison version and skeleton used to generate this parser. > > @end deftypecv > > > > +The internationalization in D is very simmilar to the one in C. The D > > +parser uses gettext for user messages and dgettext for Bison messages. > > Use @code for function names. But I'm not sure we should go into such > details here. > > > +If gettext is not present on your system, > > Use Title Case for package names, but... > > > install it and re-run Bison's > > +configuration. > > The manual is about an installed Bison. Configuration and installation > is irrelevant here. > > > + > > +For enabling internationalisation, import bindtextdomain and textdomain > > Use @code for function names. I feel unconformable with "For enabling" > instead of "To enable", but I'm not a native. > > > +from C: > > + > > +@example > > +extern(C) char* bindtextdomain(const char* domainname, const char* > dirname); > > +extern(C) char* textdomain(const char* domainname); > > +@end example > > + > > +The main function should load the translation catalogues, similarly to > the > > +@file{c/bistromathic} example: > > + > > +@example > > +int main() > > +@{ > > + import core.stdc.locale; > > + > > + // Set up internationalization. > > + setlocale(LC_ALL, ""); > > + // Use Bison's standard translation catalogue for error messages > > + // (the generated messages). > > + bindtextdomain("bison-runtime", BISON_LOCALEDIR); > > + // For the translation catalogue of your own project, use the > > + // name of your project. > > + bindtextdomain("bison", LOCALEDIR); > > + textdomain("bison"); > > + > > + // usual main content > > + ... > > +@} > > +@end example > > + > > +String aliases may be marked for internationalization (@pxref{Token > > +I18n}): > > We should avoid repeating in each language the common bits. I don't > these bits are needed it, is it? > > > + > > +@example > > +%token PLUS "+" > > + MINUS "-" > > + STAR "*" > > + SLASH "/" > > + EOL _("end of line") > > +%token <ival> NUM _("number") > > +@end example > > + > > @node D Parser Context Interface > > @subsection D Parser Context Interface > > The parser context provides information to build error reports when you > > diff --git a/tests/calc.at b/tests/calc.at > > index c11ab5c3..1e343e8a 100644 > > --- a/tests/calc.at > > +++ b/tests/calc.at > > @@ -665,6 +665,19 @@ m4_define([_AT_DATA_CALC_Y(d)], > > }; > > %printer { fprintf (yyo, "%d", $$); } <ival>; > > > > +%code { > > +]AT_TOKEN_TRANSLATE_IF([[ > > + static string _(string s) > > + { > > + if (s == "end of input") > > + return "end of file"; > > + else if (s == "number") > > + return "nombre"; > > + return s; > > I personally prefer that we maintain the "else if" chain, instead of > relying on return to break the execution flow. Make it more > functional-style > and add the last else. Currently the style is even inconsistent, with just > one else. > > And if D supports 'switch' over string, well, even better :) > > > + } > > +]])[ > > +} > > + > > /* Bison Declarations */ > > %token EOF 0 ]AT_TOKEN_TRANSLATE_IF([_("end of file")], ["end of > input"])[ > > %token <ival> NUM "number" > > Nice work! > > Cheers! >
