On Tuesday, 27 August 2019 16:57:55 PDT Kevin Kofler wrote: > If you do not explicitly add ".UTF-8", glibc always gives you the obsolete > legacy locale with the locale-specific pre-Unicode character set. This is > intentional for backwards compatibility. So you should never use a locale > without a ".UTF-8" suffix, unless, like Thiago, you want to deliberately > test what happens in a legacy non-UTF-8 locale. > > The locales are interpreted by glibc. Anything that assumes that a given > locale uses a character set different from what glibc actually uses for that > locale is broken. (But it looks like GCC doesn't assume anything about the > locale and just always uses UTF-8 to begin with, contrary to what the > documentation claims.)
Indeed. The charset can be obtained with the nl_langinfo(3) function from the C library. Since there's no tool to print it for us, we use Python: $ cat langinfo.py import locale print(locale.nl_langinfo(locale.CODESET)) $ python3 langinfo.py UTF-8 $ LC_ALL=C python3 langinfo.py ANSI_X3.4-1968 $ LC_ALL=pt_BR python3 langinfo.py ISO-8859-1 $ LC_ALL=fr_FR@euro python3 langinfo.py ISO-8859-15 $ LC_ALL=el_GR python3 langinfo.py ISO-8859-7 $ LC_ALL=zh_CN python3 langinfo.py GB2312 $ LC_ALL=ja_JP python3 langinfo.py EUC-JP I'm *so* glad I didn't remember three of the above and hadn't had to think of them for 15 years. (I thought Japanese on Unix used Shift-JIS and Russian used KOI8-R) Anyway, doing a memory wipe. Aside from ISO-8859-1, I don't want to think of any of the others for another 15 years. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel System Software Products _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development