On 2/11/21, M.-A. Lemburg <m...@egenix.com> wrote: > On 11.02.2021 13:49, Eryk Sun wrote: > >> Currently, locale.getpreferredencoding(False) is implemented as >> locale._get_locale_encoding(). This ultimately calls >> _Py_GetLocaleEncoding(), defined in "Python/fileutils.c". >> TextIOWrapper() calls this C function to get the encoding to use when >> encoding=None is passed. > > All that seems to be new in Python 3.10. This is not what's > happening in Python 3.9. The _get_locale_encoding() function > doesn't even exist.
In previous versions, locale.getpreferredencoding(False) is functionally the same. In 3.10, the latter is implemented in C via locale._get_locale_encoding(). > Why an env variable ? You could simply open up a ticket to get this > fixed, since 3.10 is not released yet. I thought it would be best to let users/administrators opt in to POSIX behavior. But maybe it should require opting out. >>>> getlocale(LC_CTYPE) > ('en_US', 'ISO8859-1') >>>> getlocale(LC_CTYPE) > ('el_GR', 'ISO8859-7') Windows code pages 1252 and 1253 are not the same as ISO-8859-1 and ISO-8859-7. getlocale() is just looking up the encoding of "en_US" and "el_GR" from the mapping in the locale module. That kind of best-guess result isn't right for locale._get_locale_encoding(). > The returned values for the encoding look mostly correct to > me, except the one for the 'C' locale which should be 'ascii'. The "C" locale in the Windows CRT uses Latin-1 for LC_CTYPE. This is implemented for mbstowcs() by casting from char to wchar_t. It's similar for wcstombs(), and limited to Unicode ordinals below 256. However, the "C" locale isn't consistently Latin-1 across other categories. IIRC, LC_TIME in the "C" locale uses the process ANSI code page for time-zone names, and mojibake is common. > Anyway, UTF-8 mode is the way to go these days, esp. if you want > to write applications which are portable across platforms and > behave the same on all. Globally setting PYTHONUTF8 forces all scripts to use UTF-8 as the default for open(). I'd like to let scripts opt in to using UTF-8 as the default for open() by way of an explicit setlocale() call such as setlocale(LC_CTYPE, (getdefaultlocale()[0], "UTF-8")) or, Windows only, setlocale(LC_CTYPE, ".UTF-8"). In POSIX, Python already tries coercing the "C" and "POSIX" locales (usually ASCII) to use UTF-8. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/A6HOUXS4E2LFCSZA4RTJ3OE6ZXHRVAQF/ Code of Conduct: http://python.org/psf/codeofconduct/