On 2/11/21, M.-A. Lemburg <m...@egenix.com> wrote:
> On 11.02.2021 13:49, Eryk Sun wrote:
>
>> Currently, locale.getpreferredencoding(False) is implemented as
>> locale._get_locale_encoding(). This ultimately calls
>> _Py_GetLocaleEncoding(), defined in "Python/fileutils.c".
>> TextIOWrapper() calls this C function to get the encoding to use when
>> encoding=None is passed.
>
> All that seems to be new in Python 3.10. This is not what's
> happening in Python 3.9. The _get_locale_encoding() function
> doesn't even exist.

In previous versions, locale.getpreferredencoding(False) is
functionally the same. In 3.10, the latter is implemented in C via
locale._get_locale_encoding().

> Why an env variable ? You could simply open up a ticket to get this
> fixed, since 3.10 is not released yet.

I thought it would be best to let users/administrators opt in to POSIX
behavior. But maybe it should require opting out.

>>>> getlocale(LC_CTYPE)
> ('en_US', 'ISO8859-1')
>>>> getlocale(LC_CTYPE)
> ('el_GR', 'ISO8859-7')

Windows code pages 1252 and 1253 are not the same as ISO-8859-1 and
ISO-8859-7. getlocale() is just looking up the encoding of "en_US" and
"el_GR" from the mapping in the locale module. That kind of best-guess
result isn't right for locale._get_locale_encoding().

> The returned values for the encoding look mostly correct to
> me, except the one for the 'C' locale which should be 'ascii'.

The "C" locale in the Windows CRT uses Latin-1 for LC_CTYPE. This is
implemented for mbstowcs() by casting from char to wchar_t. It's
similar for wcstombs(), and limited to Unicode ordinals below 256.
However, the "C" locale isn't consistently Latin-1 across other
categories. IIRC, LC_TIME in the "C" locale uses the process ANSI code
page for time-zone names, and mojibake is common.

> Anyway, UTF-8 mode is the way to go these days, esp. if you want
> to write applications which are portable across platforms and
> behave the same on all.

Globally setting PYTHONUTF8 forces all scripts to use UTF-8 as the
default for open(). I'd like to let scripts opt in to using UTF-8 as
the default for open() by way of an explicit setlocale() call such as
setlocale(LC_CTYPE, (getdefaultlocale()[0], "UTF-8")) or, Windows
only, setlocale(LC_CTYPE, ".UTF-8"). In POSIX, Python already tries
coercing the "C" and "POSIX" locales (usually ASCII) to use UTF-8.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/A6HOUXS4E2LFCSZA4RTJ3OE6ZXHRVAQF/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to