Hi!
Bison uses gnulib's unicodeio module to emit bullets (•) portably,
with a fallback to '.'. It's implemented this way (src/gram.h):
> /* Fallback in case we can't print "•". */
> static inline long
> print_dot_fallback (unsigned int code _GL_UNUSED,
> const char *msg _GL_UNUSED,
> void *callback_arg)
> {
> FILE *out = (FILE *) callback_arg;
> putc ('.', out);
> return -1;
> }
>
> /* Print "•", the symbol used to represent a point in an item (aka, a
> dotted rule). */
> static inline void
> print_dot (FILE *out)
> {
> unicode_to_mb (0x2022, fwrite_success_callback, print_dot_fallback, out);
> }
Unfortunately on Kiyoshi's environment (SunOS hidden 5.11 11.3 i86pc i386 i86pc,
GCC 9.3.0) we get '?' instead of '.' in the C locale. We get a genuine ASCII
'?', it's not some fallback from the terminal which fails to display the
character. And we properly get the bullet with en_US.UTF-8.
Kiyoshi can reproduce the problem with GNU Coreutils' printf, where he
get's a '?', although the fallback display the escape sequence (i.e.,
it should repeat '\u2022'):
> /* Simple failure callback that displays a fallback representation in plain
> ASCII, using the same notation as ISO C99 strings. */
> static long
> fallback_failure_callback (unsigned int code,
> const char *msg _GL_UNUSED,
> void *callback_arg)
> {
> FILE *stream = (FILE *) callback_arg;
>
> if (code < 0x10000)
> fprintf (stream, "\\u%04X", code);
> else
> fprintf (stream, "\\U%08X", code);
> return -1;
> }
>
> /* Outputs the Unicode character CODE to the output stream STREAM.
> Upon failure, exit if exit_on_error is true, otherwise output a fallback
> notation. */
> void
> print_unicode_char (FILE *stream, unsigned int code, int exit_on_error)
> {
> unicode_to_mb (code, fwrite_success_callback,
> exit_on_error
> ? exit_failure_callback
> : fallback_failure_callback,
> stream);
> }
Kiyoshi's messages start here:
https://lists.gnu.org/r/bug-bison/2020-07/msg00001.html
The latest:
> Le 6 juil. 2020 à 22:35, Kiyoshi KANAZAWA <[email protected]> a
> écrit :
>
> Hi Akim,
>
> $ LC_ALL=C $coreutilsbin/printf '\u2022\n' | od -t x1
> 0000000 3f 0a
> 0000002
>
> $ LC_ALL=en_US.UTF-8 $coreutilsbin/printf '\u2022\n' | od -t x1
> 0000000 e2 80 a2 0a
> 0000004
>
>
> FYI, I have very limited locale.
> $ locale -a
> C
> POSIX
> en_US.ISO8859-1
> en_US.ISO8859-15
> en_US.ISO8859-15@euro
> en_US.UTF-8
> ja_JP.PCK
> ja_JP.UTF-8
> ja_JP.UTF-8@cldr
> ja_JP.eucJP
I'm unsure what the next steps would be from here.
Thanks in advance!