[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-25 Thread Nick Coghlan


Nick Coghlan  added the comment:

Ah, part of the confusion is that I misremembered the command we run implicitly 
during startup - it's only `setlocale(LC_CTYPE, "")`, not `setlocale(LC_ALL, 
"")`.

However, the default category for `locale.getlocale()` is `LC_CTYPE`, so it 
reports the text encoding locale configured during startup, not the C level 
default.

The difference on Windows is expected - the startup code that implicitly runs 
`setlocale(LC_CTYPE, "")` doesn't get compiled in there.

So I think we have a few different potential ways of viewing this bug report:

1. As a docs issue, where we advise users to run 
`locale.getlocale(locale.LC_MESSAGES)` to find out whether or not a specific 
locale really has been configured (vs the interpreter's default text encoding 
change that runs implicitly on startup)
2. As a defaults change for 3.8+, where we switch `locale.getlocale()` over to 
checking `locale.LC_MESSAGES` instead of `locale.LC_CTYPES`, since the 
interpreter always sets the latter on startup, so it doesn't convey much useful 
information.
3. As (1) for maintenance releases, and as (2) for 3.8+

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-25 Thread STINNER Victor


STINNER Victor  added the comment:

When testing this issue, I found a bug in Python :-(

I opened bpo-33954: float.__format__('n') fails with 
_PyUnicode_CheckConsistency assertion error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-24 Thread Nicolas Hainaux

Nicolas Hainaux  added the comment:

I understand that the statement "when python starts, it runs using the C 
locale..." should not be correct anymore (and the doc should then be updated), 
but in fact this statement is still true on the systems I tested; only, the 
output of locale.getlocale() at start is in contradiction with the locale 
really set in fact.

It looks like the setting done by setlocale(LC_ALL, "") at an early stage is 
lost at some point (only locale.getlocale() seems to "remember" it).

For instance, my box locale is 'fr_FR.UTF-8', so the decimal point is a comma, 
but when starting python 3.7:


>>> import locale
>>> locale.str(2.4)
'2.4' # Wrong: if the locale in use is 'fr_FR.UTF-8', then 
'2,4' is expected instead
>>> locale.getlocale()
('fr_FR', 'UTF-8')
>>> locale.localeconv()
{'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '', 
'mon_thousands_sep': '', 'mon_grouping': [], 'positive_sign': '', 
'negative_sign': '', 'int_frac_digits': 127, 'frac_digits': 127, 
'p_cs_precedes': 127, 'p_sep_by_space': 127, 'n_cs_precedes': 127, 
'n_sep_by_space': 127, 'p_sign_posn': 127, 'n_sign_posn': 127, 'decimal_point': 
'.', 'thousands_sep': '', 'grouping': []}
>>>


Note that the output of localeconv() does match C locale, not 'fr_FR.UTF-8'.

Compare this with the outputs of locale.str() and locale.localeconv() when the 
locale is explicitly set at start:


>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'LC_CTYPE=fr_FR.utf8;LC_NUMERIC=fr_FR.UTF-8;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.utf8;LC_MONETARY=fr_FR.UTF-8;LC_MESSAGES=fr_FR.utf8;LC_PAPER=fr_FR.UTF-8;LC_NAME=fr_FR.UTF-8;LC_ADDRESS=fr_FR.UTF-8;LC_TELEPHONE=fr_FR.UTF-8;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=fr_FR.UTF-8'
>>> locale.str(2.4)
'2,4'   # Correct!
>>> locale.localeconv() # Output of localeconv() does match 'fr_FR.UTF-8' 
>>> locale
{'int_curr_symbol': 'EUR ', 'currency_symbol': '€', 'mon_decimal_point': ',', 
'mon_thousands_sep': '\u202f', 'mon_grouping': [3, 0], 'positive_sign': '', 
'negative_sign': '-', 'int_frac_digits': 2, 'frac_digits': 2, 'p_cs_precedes': 
0, 'p_sep_by_space': 1, 'n_cs_precedes': 0, 'n_sep_by_space': 1, 'p_sign_posn': 
1, 'n_sign_posn': 1, 'decimal_point': ',', 'thousands_sep': '\u202f', 
'grouping': [3, 0]}
>>>


Maybe the title of this issue should be turned to "at start, the C locale is in 
use in spite of locale.getlocale()'s output (python3 on linux)"?




As to the behaviour on Windows, I guess this is another topic (locales 
belonging to another world on Windows)... but it may be interesting to note 
that it complies with the current documentation: at start python 3.6 also uses 
the C locale, and the output of locale.getlocale() is consistent with that. 
Here is a test on Windows 10:

Python 3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit 
(AMD64)] on win32

>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.localeconv()
{'decimal_point': '.', 'thousands_sep': '', 'grouping': [], 'int_curr_symbol': 
'', 'currency_symbol': '', 'mon_decimal_point': '', 'mon_thousands_sep': '', 
'mon_grouping': [], 'positive_sign': '', 'negative_sign': '', 
'int_frac_digits': 127, 'frac_digits': 127, 'p_cs_precedes': 127, 
'p_sep_by_space': 127, 'n_cs_precedes': 127, 'n_sep_by_space': 127, 
'p_sign_posn': 127, 'n_sign_posn': 127}
>>> locale.str(2.4)
'2.4'
>>> locale.getdefaultlocale()
('fr_FR', 'cp1252')

--
components: +Library (Lib) -Documentation

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-23 Thread Nick Coghlan


Nick Coghlan  added the comment:

This statement is no longer correct: "when python starts, it runs using the C 
locale, on any platform (Windows, Linux, BSD), any python version (2, 3...), 
until locale.setlocale() is used to set another locale."

The Python 3 text model doesn't work properly in the legacy C locale due to the 
assumption of ASCII as the preferred text encoding, so we run setlocale(LC_ALL, 
"") early in the startup sequence in order to switch to something more 
sensible. In Python 3.7+, we're even more opinionated about that, and 
explicitly coerce the C locale to a UTF-8 based one if there's one available.

If our docs are still saying otherwise anywhere, then our docs are outdated, 
and need to be fixed.

--
assignee:  -> docs@python
components: +Documentation -Library (Lib)
nosy: +docs@python
stage:  -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-23 Thread Ned Deily


Ned Deily  added the comment:

Thanks for the more detailed explanation.  I think you are right that the 
behavior does not match the documentation but which is to be preferred does not 
necessarily have an easy answer.  Also, this whole area has been undergoing 
revision, for example, with new features in 3.7.  Nick and/or Victor, can you 
address Nicolas's query?

--
nosy: +ncoghlan, vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-23 Thread Nicolas Hainaux


Nicolas Hainaux  added the comment:

Sorry, I did not realize that using the word "unset" was completely misleading: 
I only meant "before any use of locale.setlocale() in python". So I'll rephrase 
this all, and add details about the python versions and platforms in this 
message.

So, first, I do not unset the environment variables from the shell before 
running python.

The only steps required to reproduce this behaviour are: open a terminal and 
run python3:

Python 3.6.5 (default, May 11 2018, 04:00:52) 
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
('fr_FR', 'UTF-8')  # Wrong: the C locale is actually in use, so (None, None) 
is expected


Explanation: when python starts, it runs using the C locale, on any platform 
(Windows, Linux, BSD), any python version (2, 3...), until locale.setlocale() 
is used to set another locale. This is expected (the doc says so in the 
getdefaultlocale() paragraph that you mentioned) and can be confirmed by the 
outputs of locale.localeconv() and locale.str().

So, before any use of locale.setlocale(), locale.getlocale() should return 
(None, None) (as this value matches the C locale).

This is the case on Windows, python2 and 3, and on Linux and FreeBSD python2.

But on Linux and FreeBSD, python>=3.4 (could not test 3.0<=python<=3.3), 
locale.getlocale() returns the value deduced from the environment variables 
instead, like locale.getdefaultlocale() already does, e.g. ('fr_FR', 'UTF-8').

All python versions I tested are from the platform distributors (3.7 only is 
compiled, but it's an automatic build from an AUR). Here is a more detailed 
list of the python versions and Linux and BSD platforms where I could observe 
this behaviour:

- Python 3.4.8, 3.5.5, 3.6.5 and 3.7.0rc1 on an up to date Manjaro (with "LTS" 
kernel): Linux 4.14.48-2-MANJARO #1 SMP PREEMPT Fri Jun 8 20:41:40 UTC 2018 
x86_64 GNU/Linux

- Python 3.6.5 on Xubuntu 18.04 (as virtual box guest) Linux 4.15.0-23-generic 
#25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

- Python 3.4.6 on openSUSE Leap 42.3 (as virtual box guest) Linux 
4.4.76-1-default #1 SMP Fri Jul 14 08:48:13 UTC 2017 (9a2885c) x86_64 x86_64 
x86_64 GNU/Linux

- Python 3.4.8 and 3.6.1 on FreeBSD 10.4-RELEASE-p8 FreeBSD 10.4-RELEASE-p8 #0: 
Tue Apr  3 18:40:50 UTC 2018 
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Problem of this behaviour on Linux and FreeBSD python>=3.4 is first, of course, 
that it's not consistent throughout all platforms, and second, that it makes it 
impossible for a python library to guess, from locale.getlocale() if the user 
(a python app) has set the locale or not (and is actually still using the C 
locale). (It is still possible to rely on locale.localeconv() to get correct 
elements).

Hope this message made things clear now :-)

--
versions: +Python 3.4, Python 3.5, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-22 Thread Ned Deily


Ned Deily  added the comment:

Can you say on which Linux platform/release you see this behavior and with 
which Python 3.6.3, i.e. from the platform distributor or built yourself?  If I 
understand your concern correctly, I cannot reproduce that behavior on a 
current Debian test system using either the Debian-supplied 3.6.6rc1 or with a  
3.6.3 built from source.  With either LANG unset or set to C (and with no LC* 
env vars set), I see:

$ unset LC_ALL LC_CTYPE LANG LANGUAGE
$ ./python
Python 3.6.3 (tags/v3.6.3:2c5fed86e0, Jun 22 2018, 16:08:11)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, None)

Note that, as documented, the locale.getdefaultlocale() checks several env vars 
'LC_ALL', 'LC_CTYPE', 'LANG' and 'LANGUAGE'.  Are you certain that all of those 
env vars are unset when you run this test?

https://docs.python.org/3.6/library/locale.html#locale.getdefaultlocale

--
nosy: +ned.deily
versions:  -Python 3.4, Python 3.5, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33934] locale.getlocale() seems wrong when the locale is yet unset (python3 on linux)

2018-06-21 Thread Nicolas Hainaux


New submission from Nicolas Hainaux :

Expected behaviour:

When unset, the locale in use is `C` (as stated in python documentation) and 
`locale.getlocale()` returns  `(None, None)` on Linux with python2.7 or on 
Windows with python2.7 and python 3.6 (at least):


$ python2
Python 2.7.15 (default, May  1 2018, 20:16:04) 
[GCC 7.3.1 20180406] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
(None, None)
>>> 


Issue:

But when using python3.4+ on Linux, instead of `(None, None)`, 
`locale.getlocale()` returns the same value as `locale.getdefaultlocale()`:


$ python
Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getlocale()
('fr_FR', 'UTF-8')
>>> locale.localeconv()
{'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '', 
'mon_thousands_sep': '', 'mon_grouping': [], 'positive_sign': '', 
'negative_sign': '', 'int_frac_digits': 127, 'frac_digits': 127, 
'p_cs_precedes': 127, 'p_sep_by_space': 127, 'n_cs_precedes': 127, 
'n_sep_by_space': 127, 'p_sign_posn': 127, 'n_sign_posn': 127, 'decimal_point': 
'.', 'thousands_sep': '', 'grouping': []}
>>> locale.str(2.5)
'2.5'


Though the locale actually in use is still `C` (as shown above by the output of 
`locale.localeconv()` and confirmed by the result of `locale.str(2.5)`, which 
shows a dot as decimal point and not a comma (as expected with `fr_FR.UTF-8`)).

I could observe this confusing behaviour on Linux with python3.4, 3.5, 3.6 and 
3.7 (rc1). (Also on FreeBSD with python3.6.1).

A problematic consequence of this behaviour is that it becomes impossible to 
detect whether the locale has already been set by the user, or not.

I could not find any other similar issue and hope this is not a duplicate.

--
components: Library (Lib)
messages: 320192
nosy: zezollo
priority: normal
severity: normal
status: open
title: locale.getlocale() seems wrong when the locale is yet unset (python3 on 
linux)
type: behavior
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com