Thiago had said: >>> GCC and Clang default to UTF-8 *unless* you pass -finput-charset to >>> something different, independent of what your locale is.
On Monday, 26 August 2019 09:20:49 PDT Lars Knoll wrote: >> That wasn’t how I understood it. Here’s the corresponding man page >> entry from gcc: >> >> -finput-charset=charset >> Set the input character set, used for translation from the >> character set of the input file to the source character set used by >> GCC. If the locale does not specify, or GCC cannot get this >> information from the locale, the default is UTF-8. This can be >> overridden by either the locale or this command-line option. >> Currently the command-line option takes precedence if there's a >> conflict. charset can be any encoding supported by the system's >> "iconv" library routine. >> >> I’m happy to be proven wrong, but to me this sounds like it’s getting >> the file encoding from the locale, if that one specifies a charset. Thiago Macieira (26 August 2019 19:33) replied: > I think the documentation is wrong. [snip] clang, gcc read input the same with LC_ALL unset and set variously to C, POSIX, en_US, pt_BR, el_GR. I note that none of these explicitly selects an encoding, so the doc above is indeed consistent with gcc guessing UTF-8 based on the value of LC_ALL. Even if the only el_GR or pt_BR locale your host actually has the necessary data compiled for are the ones using an encoding incompatible with UTF-8, gcc need not have actually checked that if it - like QSystemLocaleData on Unix - only looks at the value of environment variables. > $ LC_ALL=pt_BR ls doesntexist > ls: cannot access 'doesntexist': Arquivo ou diret�rio inexistente > $ LC_ALL=el_GR ls doesntexist > ls: cannot access 'doesntexist': ��� ������� ������ ������ � ��������� > $ LC_ALL=el_GR.UTF-8 ls doesntexist > ls: cannot access 'doesntexist': Δεν υπάρχει τέτοιο αρχείο ή κατάλογος This is a test of the translations available to ls and the assumptions *it* makes about encodings, not the assumptions gcc and clang make. So let's do this experiment with an explicit encoding in the locale (which I just told the locales package to compile and set up for me): $ LC_ALL=zh_CN.GBK gcc -S -o - -xc++ - <<<'auto s = u8"€áęǽ";' | grep -F .string .string "\342\202\254\303\241\304\231\307\275" $ gcc -S -o - -xc++ -finput-charset=GBK - <<<'auto s = u8"€áęǽ";' | grep -F .string cc1plus: error: failure to convert GBK to UTF-8 That does, indeed, show that gcc uses UTF-8 even if the locale specifies some other encoding; it only uses an alternate encoding if -finput-charset over-rides it. Eddy. _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development