Hi Hans, Hans Aberg wrote on Thu, Jun 25, 2020 at 10:15:03AM +0200:
> MacOS sets as default LC_CTYPE=UTF-8, not appearing in the 'locale > -a' list. Then some software interprets this as though the locale > is C/POSIX, disregards the UTF-8 encoding, and converts all non-ASCII > (high bit set) char's into octal escape sequences. What is the > correct interpretation here? The correct interpretation of "LC_CTYPE=UTF-8" is whatever the documentation of the respective operating system says. All POSIX says is: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html The locale argument is a pointer to a character string containing the required setting of category. The contents of this string are implementation-defined. POSIX only specifies the meaning of the strings "C" and "POSIX"; any others are implementation-defined. For example, the OpenBSD manual page says: https://man.openbsd.org/setlocale.3 The syntax and semantics of the locale argument are not standardized and vary among operating systems. On OpenBSD, if the locale string ends with ".UTF-8", the UTF-8 locale is selected; otherwise, the "C" locale is selected, which uses the ASCII character set. If the locale contains a dot but does not end with ".UTF-8", setlocale() fails. Which is indeed true here: $ uname -a OpenBSD isnote.usta.de 6.7 GENERIC.MP#224 amd64 $ LC_CTYPE=FOOBAR.UTF-8 locale charmap UTF-8 $ LC_CTYPE=UTF-8 locale charmap US-ASCII To the best of my knowledge, we are POSIX-compliant in this respect. Other systenms are of course free to make different choices. Even though POSIX says this is implementation-defined, which implies that operating systems are expected to document their specific rules, some fail to do so, for example: https://man.bsd.lv/FreeBSD-12.0/setlocale.3 https://man.bsd.lv/NetBSD-8.1/setlocale.3 Some do specify it. For example, according to https://man.bsd.lv/Linux-5.06/setlocale.3 the string "UTF-8" would be invalid because it lacks the "language" part which is mandatory on Linux. For example, on a very old Linux system i have access to: $ uname -a Linux donnerwolke.asta.kit.edu 4.9.0-0.bpo.3-686 #1 SMP \ Debian 4.9.30-2+deb9u5~bpo8+1 (2017-09-28) i686 GNU/Linux $ LC_CTYPE=en_US.UTF-8 locale charmap UTF-8 $ LC_CTYPE=UTF-8 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 Yours, Ingo