[issue26024] Non-ascii Windows locale names

2019-08-23 Thread Vidar Fauske


Vidar Fauske  added the comment:

Thanks. Note that the failing with `
locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())` I mentioned above 
is a different problem, that is tracked in a much older issue: 
https://bugs.python.org/issue10466

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2019-08-23 Thread STINNER Victor

STINNER Victor  added the comment:

It seems like this bug has been fixed in Python 3.5, so I close the issue.

What changed in Python 3.5 is that the *result* of locale.setlocale() is now 
the english name of the locale and so is compatible with the ASCII encoding.

Vidar Fauske:
> The Norwegian locale on Windows has the honor of having the only locale name 
> with a non-ASCII character 

On which Windows version? On Windows 10 build 1903 with Python 3.9, it seems 
like locale names can be encoded/decoded from ASCII:

>>> locale.setlocale(locale.LC_ALL, "swedish") 
'Swedish_Sweden.1252'
>>> locale.setlocale(locale.LC_ALL, "norwegian") 
'Norwegian_Norway.1252'

Eryk Sun confirmed that Python 3.5 doesn't seem to be affected anymore:

> The issue isn't quite the same for 3.5+. The new CRT uses Windows Vista 
> locale APIs. In this case it uses LOCALE_SENGLISHLANGUAGENAME instead of the 
> old LOCALE_SENGLANGUAGE. This maps "Norwegian" to simply "Norwegian" instead 
> of "Norwegian Bokmål":

---

If you consider there is still an issue with the second argument of 
locale.setlocale() which doesn't use the right encoding, please open a 
separated issue.

The workaround is to use the english name of locales. For example, use 
'norwegian' or 'Norwegian_Norway.1252', instead of 'Norwegian 
Bokmål_Norway.1252'.

--
resolution: third party -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2019-02-11 Thread Steve Dower


Steve Dower  added the comment:

We should switch to _wsetlocale, or else come up with a more sensible mapping 
that makes sense between platforms (like we have for encodings already).

I suspect the latter requires proper design and discussion, so it's worth doing 
the first part immediately.

--
versions: +Python 3.8 -Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2019-02-11 Thread Vidar Fauske


Vidar Fauske  added the comment:

This issue can still be triggered for Python 3.7 by the following line (running 
on a Windows machine with a Norwegian locale as default):

locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())

--
versions: +Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-09 Thread Eryk Sun

Eryk Sun added the comment:

The issue isn't quite the same for 3.5+. The new CRT uses Windows Vista locale 
APIs. In this case it uses LOCALE_SENGLISHLANGUAGENAME instead of the old 
LOCALE_SENGLANGUAGE. This maps "Norwegian" to simply "Norwegian" instead of 
"Norwegian Bokmål":

>>> locale.setlocale(locale.LC_TIME, 'norwegian')
'Norwegian_Norway.1252'

The "Norwegian Bokmål" language name has to be requested explicitly to see the 
same problem:

>>> try: locale.setlocale(locale.LC_TIME, 'Norwegian Bokmål')
... except Exception as e: print(e)
...
unsupported locale setting

The fix for 3.4 would be to encode the locale string using 
PyUnicode_AsMBCSString (ANSI). It's too late, however, since 3.4 is no longer 
getting bug fixes.

For 3.5+, setlocale could either switch to using _wsetlocale on Windows or call 
setlocale with the string encoded via Py_EncodeLocale (wcstombs). Encoding the 
string via wcstombs is required because the new CRT roundtrips the conversion 
via mbstowcs before forwarding the call to _wsetlocale. This means that success 
depends on the current LC_CTYPE, unless Python switches to calling _wsetlocale 
directly.

As a workaround for 3.5+, the new CRT also supports RFC 4646 language-tag 
locales when running on Vista or later. For example, "Norwegian Bokmål"  is 
simply "nb". 

Language-tag locales differ from POSIX locales. Superficially, they use "-" 
instead of "_" as the delimiter. More importantly, they don't allow explicitly 
setting the codeset. Instead of a .codeset, they use ISO 15924 script codes. 
Specifying a script may select a different ANSI codepage. It depends on whether 
there's an NLS definition for the language-script combination. For example, 
Bosnian can be written using either Latin or Cyrillic. Thus the "bs-BA" and 
"bs-Latn-BA" locales use the Central Europe codepage 1250, but "bs-Cyrl-BA" 
uses the Cyrillic codepage 1251. On the other hand, "en-Cyrl-US" still uses the 
Latin codepage 1252.

As a separate issue, language-tag locales break the parsing in locale.getlocale:

>>> locale.setlocale(locale.LC_TIME, 'nb-NO')
'nb-NO'
>>> try: locale.getlocale(locale.LC_TIME)
... except Exception as e: print(e)
...
unknown locale: nb-NO

>>> locale.setlocale(locale.LC_CTYPE, 'bs-Cyrl-BA')
'bs-Cyrl-BA'
>>> try: locale.getlocale(locale.LC_CTYPE)
... except Exception as e: print(e)
...
unknown locale: bs-Cyrl-BA

--
resolution:  -> third party

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-08 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
versions: +Python 3.5, Python 3.6 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-06 Thread Eryk Sun

Eryk Sun added the comment:

Yes, it's ANSI. I should have said "system locale" instead of "current locale". 
To find the requested locale, the CRT function __get_qualified_locale calls 
EnumSystemLocalesA. The passed callback calls GetLocaleInfoA for each 
enumerated locale to get the country (SENGLISHCOUNTRYNAME) and language 
(SENGLISHLANGUAGENAME).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-06 Thread STINNER Victor

STINNER Victor added the comment:

> PyLocale_setlocale in Modules/_localemodule.c is incorrectly passing the 
> locale as a UTF-8 string ("z") instead of using the codepage of the current 
> locale. 

Do you mean that the function must encode the locale name to the *ANSI 
codepage*?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-06 Thread Eryk Sun

Eryk Sun added the comment:

PyLocale_setlocale in Modules/_localemodule.c is incorrectly passing the locale 
as a UTF-8 string ("z") instead of using the codepage of the current locale. 

As you can see below "å" is passed as the UTF-8 string "\xc3\xa5":

>>> locale._setlocale(locale.LC_TIME, 'Norwegian Bokmål_Norway.1252')
Breakpoint 0 hit
MSVCR100!setlocale:
`56d23d14 48895c2408  mov qword ptr [rsp+8],rbx
  ss:`004af800=
  02ad2a68
0:000> db @rdx l0n29
`02808910  4e 6f 72 77 65 67 69 61-
   6e 20 42 6f 6b 6d c3 a5  Norwegian Bokm..
`02808920  6c 5f 4e 6f 72 77 61 79-
   2e 31 32 35 32   l_Norway.1252

The CRT's setlocale works fine when passed the locale string encoded with 
codepage 1252:

>>> msvcr100 = ctypes.CDLL('msvcr100')
>>> msvcr100.setlocale.restype = ctypes.c_char_p
>>> loc_no = 'Norwegian Bokmål_Norway.1252'.encode('1252')
>>> msvcr100.setlocale(locale.LC_TIME, loc_no)
b'Norwegian Bokm\xe5l_Norway.1252'

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This can be related to issue25812. Python supposes that locale settings in all 
categories use the same encoding (set by LC_CTYPE). Try first to set LC_CTYPE 
to ASCII-named locale with the 1252 codepage.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26024] Non-ascii Windows locale names

2016-01-06 Thread Vidar Fauske

New submission from Vidar Fauske:

The Norwegian locale on Windows has the honor of having the only locale name 
with a non-ASCII character ('Norwegian Bokmål_Norway', see e.g. 
https://wiki.postgresql.org/wiki/Changes_To_Norwegian_Locale). It does not seem 
like python 3 is able to handle this properly, as the following code 
demonstrates:

>python
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_TIME, 'swedish')
'Swedish_Sweden.1252'
>>> loc_sw = locale.getlocale(locale.LC_TIME)
>>> locale.setlocale(locale.LC_TIME, 'norwegian')
'Norwegian Bokmål_Norway.1252'
>>> loc_no = locale.getlocale(locale.LC_TIME)
>>> locale.setlocale(locale.LC_TIME, loc_sw)
'Swedish_Sweden.1252'
>>> locale.setlocale(locale.LC_TIME, loc_no)
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\prog\WinPython-64bit-3.4.3.7\python-3.4.3.amd64\lib\locale.py", line 
593, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting


As can be seen, this can be worked around when setting the locale manually, but 
if the locale has already been set to Norwegian, the value returned from 
getlocale is invalid when passed to setlocale.

Following the example of postgres in the link above, I suggest changing the 
behavior of locale.getlocale to alias 'Norwegian Bokmål_Norway.1252' as 
'Norwegian_Norway.1252', which is completely ASCII, and therefore fine.

--
components: Unicode, Windows
messages: 257608
nosy: ezio.melotti, haypo, paul.moore, steve.dower, tim.golden, vidartf, 
zach.ware
priority: normal
severity: normal
status: open
title: Non-ascii Windows locale names
type: behavior
versions: Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com