> On 22 Jun 2020, at 07:59, Akim Demaille <[email protected]> wrote: > >> Le 21 juin 2020 à 15:24, Hans Åberg <[email protected]> a écrit : >> >> >>> On 21 Jun 2020, at 14:25, Hans Åberg <[email protected]> wrote: >>> >>>> On 21 Jun 2020, at 11:45, Akim Demaille <[email protected]> wrote: >>>> >>>> What locale are you using? >>> >>> LC_CTYPE=UTF-8 >> >> The error goes away if setting LC_CTYPE=en_US.UTF-8 before recompiling the >> .yy file. >> >> UTF-8 is language independent, so MacOS uses LC_CTYPE=UTF-8, but there are >> software that require a prefix. > > Hans, > > This is double-escaping of the UTF-8 characters is a well known problem > of parse.error=verbose, that resulted in the introduction of "detailed" > parse.error. That was discussed extensively on Bison's lists, and is > documented in NEWS of 3.6: > > > > *** Improved syntax error messages > > Two new values for the %define parse.error variable offer more control to > the user. Available in all the skeletons (C, C++, Java). > > **** %define parse.error detailed > > The behavior of "%define parse.error detailed" is closely resembling that > of "%define parse.error verbose" with a few exceptions. First, it is safe > to use non-ASCII characters in token aliases (with 'verbose', the result > depends on the locale with which bison was run). Second, a yysymbol_name > function is exposed to the user, instead of the yytnamerr function and the > yytname table. Third, token internationalization is supported (see > below).
The question is if that helps, as it is the yytname_ that is translated according to the LC_CTYPE environment variable. This also introduces a locale dependency in the Bison compilation, so that the generated parser no longer is platform independent. > Besides, I have recently posted that Bison 3.7 will also make another step: > > > > *** String aliases are faithfully propagated > > Bison used to interpret user strings (i.e., decoding backslash escapes) > when reading them, and to escape them (i.e., issue non-printable > characters as backslash escapes, taking the locale into account) when > outputting them. As a consequence non-ASCII strings (say in UTF-8) ended > up "ciphered" as sequences of backslash escapes. This happened not only > in the generated sources (where the compiler will reinterpret them), but > also in all the generated reports (text, xml, html, dot, etc.). Reports > were therefore not readable when string aliases were not pure ASCII. > Worse yet: the output depended on the user's locale. > > Now Bison faithfully treats the string aliases exactly the way the user > spelled them. This fixes all the aforementioned problems. However, now, > string aliases semantically equivalent but syntactically different (e.g., > "A", "\x41", "\101") are considered to be different. This besides might help. > So, there is no new bug in 3.6 here, just something that is well known for > ages, about which you and I already discussed. Yes, there is, translation dependent on LC_CTYPE, which was not before.
