[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-04-03 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 4216dce04b7d3f329beaaafc82a77c4ac6cf4d57 by Inada Naoki in branch 
'main':
bpo-47000: Make `io.text_encoding()` respects UTF-8 mode (GH-32003)
https://github.com/python/cpython/commit/4216dce04b7d3f329beaaafc82a77c4ac6cf4d57


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-30 Thread Inada Naoki


Inada Naoki  added the comment:

> Please see https://bugs.python.org/issue47000#msg415769 for what Victor
> suggested.

Of course, I read it.

> In particular, the locale module uses the "no underscore" convention.
> Not sure whether it's good to start using snake case now, but I'm also
> not against it.

Victor didn't mention about "no underscore" convention.
I just want to see preference from others. I will remove the underscore.

> I would like to reiterate my concern with the "locale" encoding, though.
>
> As mentioned earlier, I believe it adds too much magic. It would be better
> to leave this in the hands of the applications and not try to guess
> the correct encoding.

I don't recommend to use "locale" encoding for users.
I strongly recommend to consider using "utf-8" instead.
But "locale" encoding is needed when user don't want to change behavior of 
current application.
It had been accepted by PEP 597 already.

> It's better to expose easy to use APIs to access the various different
> settings and point users to those rather than try to do a best effort
> guess... explicit is better than implicit.

In some case, user need to decide "not change the encoding for now".
If we don't provide "locale", it's difficult to change the default encoding to 
UTF-8.

> After all, Mojibake potentially corrupts important data, without the
> alerting the user and that's not really what we should be after (e.g.
> UTF-8 is valid Latin-1 in most cases and this is a real problem we often
> run into in Germany with our Umlauts).

Changing the default encoding will temporary increase this risk.
But after changing the default encoding to UTF-8, this risk will be reduced 
overwhelmingly.
Most popular text editors, including VSCode, Atom, Sublime Text, Notepad.exe 
use UTF-8 by default.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-30 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

Please see https://bugs.python.org/issue47000#msg415769 for what Victor
suggested.

In particular, the locale module uses the "no underscore" convention.
Not sure whether it's good to start using snake case now, but I'm also
not against it.

I would like to reiterate my concern with the "locale" encoding, though.

As mentioned earlier, I believe it adds too much magic. It would be better
to leave this in the hands of the applications and not try to guess
the correct encoding.

It's better to expose easy to use APIs to access the various different
settings and point users to those rather than try to do a best effort
guess... explicit is better than implicit.

After all, Mojibake potentially corrupts important data, without the
alerting the user and that's not really what we should be after (e.g.
UTF-8 is valid Latin-1 in most cases and this is a real problem we often
run into in Germany with our Umlauts).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-30 Thread Inada Naoki


Inada Naoki  added the comment:

@vstiner Since UTF-8 mode affects `locale.getpreferredencoding(False)`, I need 
to decide alternative API in the PEP 686.

If no objections, I will choose `locale.get_encoding()` for current locale 
encoding (ACP on Windows).

See https://github.com/python/peps/pull/2470/files

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-23 Thread Inada Naoki


Inada Naoki  added the comment:

I am not sure about we really need "locale encoding at Python startup".

For this issue, I don't want to change `encoding="locale"` behavior except 
ignore UTF-8 mode. So what I want is "current locale encoding" or 
 ANSI codepage on Windows.

On the other hand, I know Eryk wants to support locale on Windows. So 
`locale.get_encoding()` might return current locale encoding (not ANSI 
codepage) even on Windows.
If so, I will use `sys.getlocaleencoding()` to implement `encoding="locale"` to 
keep using ANSI codepage, instead of adding yet another "get locale encoding" 
function.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-23 Thread STINNER Victor


STINNER Victor  added the comment:

sys.getlocaleencoding() versus locale.getencoding().

For me, the Python locale module should use the C API to access the Unix 
locales like LC_CTYPE, nl_langinfo(CODESET), etc.

The sys module are more for things specific to Python, like 
sys.getfilesystemencoding().

Since sys.getlocaleencoding() would be a fixed value for the whole process life 
time, I agree that the sys module is a better place.

I can write a PR adding sys.getlocaleencoding() if we agree on the API.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +30157
pull_request: https://github.com/python/cpython/pull/32068

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread Inada Naoki


Inada Naoki  added the comment:

> * sys.getfilesystemencoding(): Python filesystem encoding, return "UTF-8" if 
> the Python UTF-8 Mode is enabled

Yes, althoguh PYTHONLEGACYWINDOWSFSENCODING takes priority.


> * locale.getencoding(): Get the locale encoding, LC_CTYPE locale encoding or 
> the Windows ANSI code page, *read at Python startup*. Ignore the Python UTF-8 
> Mode.

I proposed `locale.get_encoding()` in the PEP 686. I will remove underscore if 
you don't like it.


> * locale.getencoding(current=True): Get the *current* locale encoding. The 
> difference with locale.getencoding() is that on Unix, it gets the LC_CTYPE 
> locale encoding at each call.

Hmm, I don't add it to the PEP 686 because it is not relating to UTF-8 mode nor 
EncodingWarning.

Since `locale.getencoding()` returns locale encoding on startup, how about this 
idea?

* sys.getlocaleencoding() -- Get the locale encoding read at Python startup.
* locale.getencoding() -- Get the current locale encoding.

Note that we have `sys.getdefaultencoding()` and `sys.getfilesystemencoding()`. 
`sys.getlocaleencoding()` looks consistent with them.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread STINNER Victor


STINNER Victor  added the comment:

I propose:

* sys.getfilesystemencoding(): Python filesystem encoding, return "UTF-8" if 
the Python UTF-8 Mode is enabled

  * Implementation: PyConfig.filesystem_encoding
  * https://docs.python.org/dev/library/sys.html#sys.getfilesystemencoding
  * 
https://docs.python.org/dev/glossary.html#term-filesystem-encoding-and-error-handler
  * 
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_encoding

* locale.getencoding(): Get the locale encoding, LC_CTYPE locale encoding or 
the Windows ANSI code page, *read at Python startup*. Ignore the Python UTF-8 
Mode.

  * https://docs.python.org/dev/glossary.html#term-locale-encoding
  * Implementation: _Py_GetLocaleEncoding()
  * Existing *private* function: _locale._get_locale_encoding()

* locale.getencoding(current=True): Get the *current* locale encoding. The 
difference with locale.getencoding() is that on Unix, it gets the LC_CTYPE 
locale encoding at each call.

  * Implementation: _Py_GetLocaleEncoding() modified to ignore the UTF-8 mode.

None of these functions do locale.setlocale(locale.LC_CTYPE, "") to get the 
user preferred encoding.

Only the locale.getpreferredencoding() function uses 
locale.setlocale(locale.LC_CTYPE, "").

Usage of locale.getpreferredencoding() should be discouraged in the 
documentation, but I don't think that it can be deprecated and scheduled for 
removal right now: too much code rely on it :-(

---

So we have 3 encodings:

* Python filesystem encoding
* Locale encoding
* Current locale encoding

Examples of usage:

* Python filesystem encoding:

  * os.fsdecode() / os.fsencode()
  * C: PyUnicode_EncodeFSDefault() / PyUnicode_DecodeFSDefault()

* Locale encoding

  * _locale._get_locale_encoding()
  * On Unix, os.device_encoding()
  * To initialize PyConfig.stdio_encoding and PyConfig.filesystem_encoding

* Current locale encoding

  * PyUnicode_EncodeLocale() / PyUnicode_DecodeLocale()
  * "current_locale" parameter of private _Py_EncodeLocaleEx() / 
_Py_DecodeLocaleEx()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread STINNER Victor


STINNER Victor  added the comment:

> But some user want to use UTF-8 mode to change default encoding in their 
> Python environments without waiting Python default encoding changed.

IMO it's a different use case and it should be a different thing. Changing 
encoding="locale" today is too late, since it's already shipped in Python 3.10 
(PEP 597).

I proposed the "current locale" name to distinguish it from the existing 
"locale":

* "current locale": LC_CTYPE locale encoding or ANSI code page
* "locale": "UTF-8" in UTF-8 Mode, or the current locale

The unclear part to me is if "current locale" must change if the LC_CTYPE 
locale is changed, or if it should be read once at startup and then never 
change.

There *are* use case to really read the *current* LC_CTYPE locale encoding. 
There is already C API for that:

* PyUnicode_EncodeLocale()
* PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()

See also the "current_locale" parameter of the private API _Py_EncodeLocaleEx() 
and _Py_DecodeLocaleEx().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-22 Thread STINNER Victor


STINNER Victor  added the comment:

> Is it safe to use `locale.getlocale(locale.LC_CTYPE)[1] or "UTF-8"`?

I would like to deprecate getlocale(), see bpo-43557.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-19 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +30091
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32003

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-15 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

FWIW: I don't think the "locale" encoding is a good idea. Instead of
trying to fix this to make it more usable, I'd suggest to deprecate
and remove it again.

When it comes to encodings, explicit is better than implicit.

If an application wants to work with some user defined locale settings,
it's better for the application to decide where to pick the locale
settings from, e.g. the OS, the UI, an application config setting,
etc.

There are too many ways this can be done and trying to build
magic to determine the "right" one is bound to fail in one way or
another.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-15 Thread Inada Naoki


Inada Naoki  added the comment:

I created another topic relating this issue.
https://discuss.python.org/t/add-legacy-text-encoding-option-to-make-utf-8-default/14281

If we add another option (e.g. legacy_text_encoding), we do not need to change 
UTF-8 mode behavior.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-14 Thread STINNER Victor


STINNER Victor  added the comment:

There are multiple "locale encodings":

* "current" locale encoding: locale.nl_langinfo(locale.CODESET)
* "Python" locale encoding: locale.getpreferredencoding(False), ignore the 
locale in UTF-8 Mode (always return "UTF-8"), ignore the locale on Android and 
VxWorks (always return "UTF-8")
* Python "filesystem" encoding: similar to the Python locale encoding, but 
always use UTF-8 on Android, macOS and VxWorks

Include/pyport.h:
---
#if defined(__ANDROID__) || defined(__VXWORKS__)
   // Use UTF-8 as the locale encoding, ignore the LC_CTYPE locale.
   // See _Py_GetLocaleEncoding(), PyUnicode_DecodeLocale()
   // and PyUnicode_EncodeLocale().
#  define _Py_FORCE_UTF8_LOCALE
#endif

#if defined(_Py_FORCE_UTF8_LOCALE) || defined(__APPLE__)
   // Use UTF-8 as the filesystem encoding.
   // See PyUnicode_DecodeFSDefaultAndSize(), PyUnicode_EncodeFSDefault(),
   // Py_DecodeLocale() and Py_EncodeLocale().
#  define _Py_FORCE_UTF8_FS_ENCODING
#endif
---

See bpo-43552 "Add locale.get_locale_encoding() and 
locale.get_current_locale_encoding()" (rejected).

Marc-Andre Lemburg dislikes locale.getpreferredencoding(False) API and 
suggested adding a new function locale.getencoding() with no argument:
https://bugs.python.org/issue46659#msg412667

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-14 Thread STINNER Victor


STINNER Victor  added the comment:

> So I think `encoding="locale"` should use real locale encoding (ACP on 
> Windows) regardless UTF-8 mode is enabled or not.

If you want to change the default, would it be possible to add a function to 
get this encoding?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-13 Thread Inada Naoki


Inada Naoki  added the comment:

I created a related topic on discuss.python.org.
https://discuss.python.org/t/jep-400-utf-8-by-default-and-future-of-python/14246

If we recommend `PYTHONUTF8` as opt-in "UTF-8 by default", `encoding="locale"` 
should locale encoding in UTF-8 mode.

If we don't change `PYTHONUTF8` behavior, we need yet another option for opt-in 
"UTF-8 by default".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47000] Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.

2022-03-12 Thread Inada Naoki


New submission from Inada Naoki :

Currently, `encoding="locale"` is just shortcut of 
`encoding=locale.getpreferredencoding(False)`.

`encoding="locale"` means that "locale encoding should be used here, even if 
Python default encoding is changed to UTF-8".

I am not sure that UTF-8 mode becomes the default or not.
But some user want to use UTF-8 mode to change default encoding in their Python 
environments without waiting Python default encoding changed.

So I think `encoding="locale"` should use real locale encoding (ACP on Windows) 
regardless UTF-8 mode is enabled or not.

Currently, UTF-8 mode affects to `_Py_GetLocaleEncoding()`. So it is difficult 
that make encoding="locale" ignores UTF-8 mode.
Is it safe to use `locale.getlocale(locale.LC_CTYPE)[1] or "UTF-8"`?

--
components: Unicode
messages: 415028
nosy: ezio.melotti, methane, vstinner
priority: normal
severity: normal
status: open
title: Make encoding="locale" uses locale encoding even in UTF-8 mode is 
enabled.
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com