I was working on integrating Vincent's counterexamples into the reports when I noticed that our reports were truly ugly when UTF-8 was used in string aliases.
I have always had problems with the way bison escapes the user's strings (which resulted in people not being able to trust the aliases, something which error=detailed fixed in 3.6), but that was really yet another sign that we should not toy with the user's spelling of her strings So this batch revises the way Bison parses strings. Instead of interpreting them (i.e., resolving the escapes), it keeps them the way they are, but it does check for their validity (so it will reject \777 for instance). Only when the string must be interpreted (e.g., for %output) do we unquote the strings. The news entry reads: *** String aliases are faithfully propagated Bison used to interpret user strings (i.e., decoding backslash escapes) when reading them, and to escape them (i.e., issue non-printable characters as backslash escapes, taking the locale into account) when outputing them. As a consequence non-ASCII strings (say in UTF-8) ended up "ciphered" as sequences of backslash escapes. This happened not only in the generated sources (where the compiler will reinterpret them), but also in all the generated reports (text, xml, html, dot, etc.). Reports were therefore not readable when string aliases were not pure ASCII. Worse yet: the output depended on the user's locale. Now Bison faithfully treats the string aliases exactly the way the user spelled them. This fixes all the aforementioned problems. However, now, string aliases semantically equivalent but syntactically different (e.g., "A", "\x41", "\101") are considered to be different. Cheers! Akim Demaille (9): style: prefer 'FOO ()' to 'FOO' for function-like macros style: reduce scopes style: introduce & use STRING_1GROW style: factor common bits about string scanning tests: check reports with conflicts and UTF-8 parser: keep string aliases as the user wrote it regen reports: don't escape the labels reports: the column width differs from the byte count NEWS | 17 ++ src/flex-scanner.h | 17 +- src/graphviz.c | 22 +- src/graphviz.h | 6 - src/main.c | 8 +- src/muscle-tab.c | 3 +- src/parse-gram.c | 212 ++++++++++++++---- src/parse-gram.h | 9 +- src/parse-gram.y | 193 ++++++++++++++--- src/print-graph.c | 19 +- src/print.c | 13 +- src/reader.c | 1 + src/scan-code.l | 26 +-- src/scan-gram.l | 122 +++++------ src/scan-skel.l | 8 +- src/system.h | 17 ++ tests/input.at | 4 +- tests/regression.at | 4 +- tests/report.at | 515 ++++++++++++++++++++++++++++++++++++++++++++ 19 files changed, 1023 insertions(+), 193 deletions(-) -- 2.27.0
