On Tue, 17 Nov 2020, David Malcolm via Gcc wrote: > What is the intended encoding of GCC's stderr?
The locale encoding. > - blithely accept and emit filenames as bytes (I don't think we make > any attempt to enforce that they're any particular encoding) File names that aren't in the locale encoding aren't expected to work very well in general (but some applications such as "ls" try to be careful about handling arbitrary file names). > - emit format strings in whatever encoding gettext gives us It's gettext's responsibility to handle translation from the character set of the compiled message catalog to the locale character set if necessary. > - emit identifiers as char * from IDENTIFIER_POINTER, calling > identifier_to_locale on them in many places, but I suspect we're > missing some Use of %qE in format strings for identifiers, rather than using IDENTIFIER_POINTER manually, is generally a good idea where possible to get this to happen automatically. > So I think our current policy is: > - we assume filenames are encoded in the locale encoding, and pass them > through as bytes with no encode/decode > - we emit to stderr in the locale encoding (but there are likely bugs > where we don't re-encode from UTF-8 to the locale encoding) > > Does this sound correct? Yes. -- Joseph S. Myers [email protected]
