On Tue, 17 Nov 2020, David Malcolm via Gcc wrote:

> What is the intended encoding of GCC's stderr?

The locale encoding.

> - blithely accept and emit filenames as bytes (I don't think we make
> any attempt to enforce that they're any particular encoding)

File names that aren't in the locale encoding aren't expected to work very 
well in general (but some applications such as "ls" try to be careful 
about handling arbitrary file names).

> - emit format strings in whatever encoding gettext gives us

It's gettext's responsibility to handle translation from the character set 
of the compiled message catalog to the locale character set if necessary.

> - emit identifiers as char * from IDENTIFIER_POINTER, calling
> identifier_to_locale on them in many places, but I suspect we're
> missing some

Use of %qE in format strings for identifiers, rather than using 
IDENTIFIER_POINTER manually, is generally a good idea where possible to 
get this to happen automatically.

> So I think our current policy is:
> - we assume filenames are encoded in the locale encoding, and pass them
> through as bytes with no encode/decode
> - we emit to stderr in the locale encoding (but there are likely bugs
> where we don't re-encode from UTF-8 to the locale encoding)
> 
> Does this sound correct?

Yes.

-- 
Joseph S. Myers
[email protected]

Reply via email to