On 2/4/21, Ben Rudiak-Gould <benrud...@gmail.com> wrote: > > My proposal is to add a couple of single-character options to open()'s mode > parameter. 'b' and 't' already exist, and the encoding parameter > essentially selects subcategories of 't', but it's annoyingly verbose and > so people often omit it. > > If '8' was equivalent to specifying encoding='UTF-8', and 'L' was > equivalent to specifying encoding=(the real locale encoding, ignoring UTF-8 > mode), that would go a long way toward making open more convenient in the > common cases on Windows, and I bet it would encourage at least some of > those developing on Unixy platforms to write more portable code also.
A precedent for using the mode parameter is [_w]fopen in MSVC, which supports a "ccs=<encoding>" flag, where "<encoding>" can be "UTF-8", "UTF-16LE", or "UNICODE". --- In terms of using the 'locale', keep in mind that the implementation in Windows doesn't use the current LC_CTYPE locale. It only uses the default locale, which in turn uses the process active (ANSI) code page. The latter is a system setting, unless overridden to UTF-8 in the application manifest (e.g. the manifest that's embedded in "python.exe"). I'd like to see support for a -X option and/or environment variable to make Python in Windows actually use the current locale to get the locale encoding (a real shocker, I know). For example, setlocale(LC_CTYPE, "el_GR") would select "cp1253" (Greek) as the locale encoding, while setlocale(LC_CTYPE, "el_GR.utf-8") would select "utf-8" as the locale encoding. (The CRT supports UTF-8 in locales starting with Windows 10, build 17134, released on 2018-04-03.) At startup, Python 3.8+ calls setlocale(LC_CTYPE, "") to use the default locale, for use with C functions such as mbstowcs(). This allows the default behavior to remain the same, unless the new option also entails attempting locale coercion to UTF-8 via setlocale(LC_CTYPE, ".utf-8"). The following gets the current locale's code page in C: #include <"locale.h"> // ... loc = _get_current_locale(); locinfo = (__crt_locale_data_public *)loc->locinfo; cp = locinfo->_locale_lc_codepage; The "C" locale uses code page 0. C mbstowcs() and wcstombs() handle this case as Latin-1. locale._get_locale_encoding() could instead map it to the process ANSI code page, GetACP(). Also, the CRT displays CP_UTF8 (65001) as "utf8". _get_locale_encoding() should map it to "utf-8" instead of "cp65001". _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MZC4DDCTMOX25ZQVUGBNLE6VPVXHXNKU/ Code of Conduct: http://python.org/psf/codeofconduct/