Something I've just noticed that needs to be clarified: on Linux, "C" locale and "POSIX" locale are aliases, but this isn't true in general (e.g. it's not the case on *BSD systems, including Mac OS X).
To handle that in PEP 538, I made it clear that everything is keyed specifically off the "C" locale, since that's what you actually get by default. So if PEP 540 is going to implicitly trigger switching encodings, it needs to specify whether it's going to look for the C locale or the POSIX locale (I'd suggest C locale, since that's the actual default that causes problems). The precedence relationship with locale coercion also needs to be spelled out: successful locale coercion should skip implicitly enabling UTF-8 mode (for opt-in UTF-8 mode, we'd still try to coerce the locale setting as appropriate, so extensions modules are more likely to behave themselves). On 6 December 2017 at 14:07, INADA Naoki <[email protected]> wrote: > Oh, revised version is really short! > > And I have one worrying point. > With UTF-8 mode, open()'s default encoding/error handler is > UTF-8/surrogateescape. > > Containers are really growing. PyCharm supports Docker and many new Python > developers use Docker instead of installing Python directly on their system, > especially on Windows. > > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. > > On the other hand, it helps some use cases when user want byte-transparent > behavior, without modifying code to use "surrogateescape" explicitly. > > Which is more important scenario? Anyone has opinion about it? > Are there any rationals and use cases I missing? For platforms that offer a C.UTF-8 locale, I'd like "LC_CTYPE=C.UTF-8 python" and "PYTHONCOERCECLOCALE=0 LC_CTYPE=C PYTHONUTF8=1" to be equivalent (aside from the known limitation that extension modules may not do the right thing in the latter case). For the locale coercion case, the default error handler for `open` remains as "strict", which means I'd be in favour of keeping it as "strict" by default in UTF-8 mode as well. That would flip the toggle in the PEP: "strict UTF-8" would be the default selection for "PYTHONUTF8=1, and you'd choose the more relaxed option via "PYTHONUTF8=permissive". That way, the combination of PEPs 538 and 540 would give us the following situation in the C locale: 1. Our preferred approach is to coerce LC_CTYPE in the C locale to a UTF-8 based equivalent 2. Only if that fails (e.g. as it will on CentOS 7) do we resort to implicitly enabling CPython's internal UTF-8 mode (which should behave like C.UTF-8, *except* for the fact extension modules won't respect it) That way, the ideal outcome is that a UTF-8 based locale exists, and we use it automatically when needed. UTF-8 mode than lets us cope with older platforms where neither C.UTF-8 nor an equivalent exists. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
