New submission from Nick Coghlan <ncogh...@gmail.com>:

While working on the docs updates for bpo-34589 (clarifying that 
"PYTHONCOERCECLOCALE=0" and "PYTHONCOERCELOCALE=warn" need both the environment 
variable name and the value to be encoded as ASCII in order to have any 
effect), I realised that it was less explicit how to reliably enable UTF-8 
mode, since that can be enabled even when the current locale is a nominally 
ASCII-incompatible one like gb18030, and the command line settings get 
processed as wchar strings rather than 8-bit char strings.

>From what I've been able to figure out, the environment variable case is the 
>same as for locale coercion: both the environment variable name and the value 
>need to be encoded as ASCII. This actually happens implicitly, as even 
>encodings like gb18030 still encode ASCII letters and numbers the same way 
>ASCII does - their incompatibilities with ASCII lie elsewhere. Fully 
>incompatible encodings like UTF-16 and UTF-32 don't get used as locale 
>encodings in the first place because they'd break too many applications.

I believe the same holds true for the command line arguments, just in the other 
direction: they get converted to wchar* with either mbstowcs or mrbtowc, and 
then compared using wcscmp or wcsncmp, but for all encodings that actually get 
used as locale encodings, the ASCII code points that CPython cares about get 
mapped directly to the corresponding UTF-16-LE or UTF-32 code point at both 
compile time (in the code) and at runtime (when reading the arg string).

Given that simply not thinking about the problem will actually do the right 
thing in all cases, I don't think this needs to be documented prominently, but 
I do think it would be good to explicitly address the point somewhere.

----------
assignee: docs@python
components: Documentation
messages: 327236
nosy: docs@python, eric.snow, ncoghlan, vstinner
priority: low
severity: normal
stage: needs patch
status: open
title: Clarify text encoding used to enable UTF-8 mode
type: enhancement
versions: Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34914>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to