On 6 December 2017 at 11:01, Victor Stinner <victor.stin...@gmail.com> wrote: >> Annex: Differences between the PEP 538 and the PEP 540 >> ====================================================== >> >> The PEP 538 uses the "C.UTF-8" locale which is quite new and only >> supported by a few Linux distributions; this locale is not currently >> supported by FreeBSD or macOS for example. This PEP 540 supports all >> operating systems. >> >> The PEP 538 only changes the behaviour for the POSIX locale. While the >> new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can >> be enabled manually for any other locale. >> >> The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any >> non-Python code running in the process is impacted by this change. This >> PEP is implemented in Python internals and ignores the locale: >> non-Python running in the same process is not aware of the "Python UTF-8 >> mode".
I submitted a PR to reword this part: https://github.com/python/peps/pull/493 > The main advantage of the PEP 538 ùover* the PEP 540 is that, for the > POSIX locale, non-Python code running in the same process gets the > UTF-8 encoding. > > To be honest, I'm not sure that there is a lot of code in the wild > which uses "text" types like the C type wchar_t* and rely on the > locale encoding. Almost all C library handle data as bytes using the > char* type, like filenames and environment variables. At the very least, GNU readline breaks if you don't change the locale setting: https://www.python.org/dev/peps/pep-0538/#considering-locale-coercion-independently-of-utf-8-mode Given that we found an example of this directly in the standard library, I assume that there are plenty more in third party extension modules (especially once we take C++ extensions into account, not just C ones). > First I understood that the PEP 538 changed the locale encoding using > an environment variable. But no, it's implemented with > setlocale(LC_CTYPE, "C.UTF-8") which only impacts the current process > and is not inherited by child processes. So I'm not sure anymore that > PEP 538 and PEP 540 are really complementary. It sets the LC_CTYPE environment variable as well: https://www.python.org/dev/peps/pep-0538/#explicitly-setting-lc-ctype-for-utf-8-locale-coercion The relevant code is in _coerce_default_locale_settings (currently at https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L448) > I'm not sure how PyGTK interacts with the PEP 538 for example. Does it > use UTF-8 with the POSIX locale? Desktop environments aim not to get into this situation in the first place by ensuring they're using a more appropriate locale :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com