[issue5815] locale.getdefaultlocale() missing corner case

2013-11-12 Thread Mike FABIAN

Mike FABIAN added the comment:

Serhiy While normalize can return sd...@devanagari.utf-8, _parse_localename()
Serhiy should be able correctly parse it.

But if normalize returns sd...@devanagari.utf-8, isn’t that quite
useless because it is a locale name which does not actually work
in glibc?

Serhiy Removing sd...@devanagari.utf-8 from alias table is another issue.

Yes. I think it should be fixed in the alias table as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5815
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5815] locale.getdefaultlocale() missing corner case

2013-11-10 Thread Mike FABIAN

Mike FABIAN added the comment:

Serhiy, in your patch you seem to have special treatment for
the devanagari modifier:

+# Devanagari modifier placed before encoding.
+return code, modifier.split('.')[1]

Probably because of 

   'ks_in@devanagari': 'ks...@devanagari.utf-8',
   'sd':   'sd...@devanagari.utf-8',

in the locale_alias dictionary.

But I think these two lines are just wrong, this mistake
is inherited from the locale.alias from X.org where the
python locale_alias comes from.

glibc:

mfabian@ari:~
$ locale -a | grep ^sd
sd_IN
sd_IN.utf8
sd_IN.utf8@devanagari
sd_IN@devanagari
mfabian@ari:~
$ locale -a | grep ^ks
ks_IN
ks_IN.utf8
ks_IN.utf8@devanagari
ks_IN@devanagari
mfabian@ari:~
$ 

The encoding should always be *before* the modifier.

--
nosy: +mfabian

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5815
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5815] locale.getdefaultlocale() missing corner case

2013-11-10 Thread Mike FABIAN

Mike FABIAN added the comment:

Serhiy The /usr/share/X11/locale/locale.alias file in Ubuntu 12.04 LTS
Serhiy contains ks...@devanagari.utf-8 and sd...@devanagari.utf-8
Serhiy entities.

Yes, I know, that’s why I wrote that the Python code inherited this mistake
from X.org.

Serhiy While the encoding is expected to be before the modifier, if
Serhiy there are systems with ks...@devanagari.utf-8 or
Serhiy sd...@devanagari.utf-8 locales we should support these weird case.

There are no such systems really, in X.org this is just a mistake.
glibc doesn’t write it like this and it is agains the specification
here:

http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html#tag_002

 [language[_territory][.codeset][@modifier]]

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5815
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5815] locale.getdefaultlocale() missing corner case

2013-11-10 Thread Mike FABIAN

Mike FABIAN added the comment:

In glibc, sd...@devanagari.utf-8 is an invalid locale name,
only sd_IN.UTF-8@devanagari is valid:

mfabian@ari:~
$ LC_ALL=sd_IN.UTF-8@devanagari locale charmap
UTF-8
mfabian@ari:~
$ LC_ALL=sd...@devanagari.utf-8 locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968
mfabian@ari:~
$ 

So I think this should be fixed in X.org.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5815
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin

2013-11-09 Thread Mike FABIAN

New submission from Mike FABIAN:

Originally reported here: 

https://bugzilla.redhat.com/show_bug.cgi?id=1024667

I found that Serbian translations in Latin do not work when the locale
name is written as sr_RS.UTF-8@latin (one gets the cyrillic
translations instead), but they *do* work when the locale name is
written as sr_RS@latin (i.e. omitting the '.UTF-8'):

$ LANG='sr_RS.UTF-8'  python2 -c 'import gettext; 
print(gettext.ldgettext(anaconda, What language would you like to use during 
the installation process?).decode(UTF-8))'
Који језик бисте желели да користите током процеса инсталације?
mfabian@ari:~
$ LANG='sr_RS.UTF-8@latin'  python2 -c 'import gettext; 
print(gettext.ldgettext(anaconda, What language would you like to use during 
the installation process?).decode(UTF-8))'
Који језик бисте желели да користите током процеса инсталације?
mfabian@ari:~
$ LANG='sr_RS@latin'  python2 -c 'import gettext; 
print(gettext.ldgettext(anaconda, What language would you like to use during 
the installation process?).decode(UTF-8))'
Koji jezik biste želeli da koristite tokom procesa instalacije?
mfabian@ari:~
$ 

The “gettext” command line tool does not have this problem:

mfabian@ari:~
$ LANG='sr_RS@latin' gettext anaconda What language would you like to use 
during the installation process?
Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~
$ LANG='sr_RS.UTF-8@latin' gettext anaconda What language would you like to 
use during the installation process?
Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~
$ LANG='sr_RS.UTF-8' gettext anaconda What language would you like to use 
during the installation process?
Који језик бисте желели да користите током процеса инсталације?mfabian@ari:~
$

--
components: Library (Lib)
messages: 202467
nosy: mfabian
priority: normal
severity: normal
status: open
title: normalize() in locale.py fails for sr_RS.UTF-8@latin
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19534
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin

2013-11-09 Thread Mike FABIAN

Mike FABIAN added the comment:

The problem turns out to be caused by a problem in normalizing
the locale name, see the output of  this test program:

mfabian@ari:~
$ cat ~/tmp/mike-test.py
#!/usr/bin/python2

import sys
import os
import locale
import encodings
import encodings.aliases

test_locales = [
'ja_JP.UTF-8',
'de_DE.SJIS',
'de_DE.foobar',
'sr_RS.UTF-8@latin',
'sr_rs@latin',
'sr@latin',
'sr_yu',
'sr_yu.SJIS@devanagari',
'sr@foobar',
'sR@foObar',
'sR',
]

for test_locale in test_locales:
print(%(orig)s - %(norm)s
  %{'orig': test_locale,
'norm': locale.normalize(test_locale)}
)

mfabian@ari:~
$ python2 ~/tmp/mike-test.py
ja_JP.UTF-8 - ja_JP.UTF-8
de_DE.SJIS - de_DE.SJIS
de_DE.foobar - de_DE.foobar
sr_RS.UTF-8@latin - sr_RS.utf_8_latin
sr_rs@latin - sr_RS.UTF-8@latin
sr@latin - sr_RS.UTF-8@latin
sr_yu - sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari - sr_RS.sjis_devanagari
sr@foobar - sr@foobar
sR@foObar - sR@foObar
sR - sr_RS.UTF-8
mfabian@ari:~
$ 

I.e. “sr_RS.UTF-8@latin” is normalized to “sr_RS.utf_8_latin” which
is clearly wrong and causes a fallback to sr_RS when using gettext
which gives the cyrillic translations.

--
Added file: http://bugs.python.org/file32551/mike-test.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19534
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin

2013-11-09 Thread Mike FABIAN

Mike FABIAN added the comment:

A simple fix for that problem could look like this:

mfabian@ari:~
$ diff -u /usr/lib64/python2.7/locale.py.orig /usr/lib64/python2.7/locale.py
--- /usr/lib64/python2.7/locale.py.orig 2013-11-09 09:08:24.807331535 +0100
+++ /usr/lib64/python2.7/locale.py  2013-11-09 09:08:34.526390646 +0100
@@ -377,7 +377,7 @@
 # First lookup: fullname (possibly with encoding)
 norm_encoding = encoding.replace('-', '')
 norm_encoding = norm_encoding.replace('_', '')
-lookup_name = langname + '.' + encoding
+lookup_name = langname + '.' + norm_encoding
 code = locale_alias.get(lookup_name, None)
 if code is not None:
 return code
@@ -1457,6 +1457,7 @@
 'sr_cs@latn':   'sr_RS.UTF-8@latin',
 'sr_me':'sr_ME.UTF-8',
 'sr_rs':'sr_RS.UTF-8',
+'sr_rs.utf8@latin':  'sr_RS.UTF-8@latin',
 'sr_rs.utf8@latn':  'sr_RS.UTF-8@latin',
 'sr_rs@latin':  'sr_RS.UTF-8@latin',
 'sr_rs@latn':   'sr_RS.UTF-8@latin',
mfabian@ari:~
$

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19534
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin

2013-11-09 Thread Mike FABIAN

Mike FABIAN added the comment:

in locale.py, the comment above “locale_alias = {” says:

# Note that the normalize() function which uses this tables
# removes '_' and '-' characters from the encoding part of the
# locale name before doing the lookup. This saves a lot of
# space in the table.

But in normalize(), this is actually not done:

# First lookup: fullname (possibly with encoding)
norm_encoding = encoding.replace('-', '')
norm_encoding = norm_encoding.replace('_', '')
lookup_name = langname + '.' + encoding
code = locale_alias.get(lookup_name, None)

“norm_encoding” holds the locale name with these replacements,
but then it is not used in the lookup.

The patch in http://bugs.python.org/msg202469
fixes that, using the norm_encoding together with adding the alias

+'sr_rs.utf8@latin':  'sr_RS.UTF-8@latin',

makes it work for sr_RS.UTF-8@latin, my test program then outputs:

mfabian@ari:~
$ python2 ~/tmp/mike-test.py
ja_JP.UTF-8 - ja_JP.UTF-8
de_DE.SJIS - de_DE.SJIS
de_DE.foobar - de_DE.foobar
sr_RS.UTF-8@latin - sr_RS.UTF-8@latin
sr_rs@latin - sr_RS.UTF-8@latin
sr@latin - sr_RS.UTF-8@latin
sr_yu - sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari - sr_RS.sjis_devanagari
sr@foobar - sr@foobar
sR@foObar - sR@foObar
sR - sr_RS.UTF-8
mfabian@ari:~
$ 

But note that the normalization of the “sr_yu.SJIS@devanagari”
locale is still weird (of course a “sr_yu.SJIS@devanagari”
is quite silly and does not exist anyway, but the code in normalize()
does not seem to work as intended.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19534
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin

2013-11-09 Thread Mike FABIAN

Mike FABIAN added the comment:

I think the patch I attach here is a better fix than the
patch in http://bugs.python.org/msg202469 because
it makes the normalize() function behave more logical overall,
with this patch, my test program prints:

mfabian@ari:/local/mfabian/src/cpython (2.7-mike %)
$ ./python ~/tmp/mike-test.py
ja_JP.UTF-8 - ja_JP.UTF-8
de_DE.SJIS - de_DE.SJIS
de_DE.foobar - de_DE.foobar
sr_RS.UTF-8@latin - sr_RS.UTF-8@latin
sr_rs@latin - sr_RS.UTF-8@latin
sr@latin - sr_RS.UTF-8@latin
sr_yu - sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari - sr_RS.SJIS@devanagari
sr@foobar - sr_RS.UTF-8@foobar
sR@foObar - sr_RS.UTF-8@foobar
sR - sr_RS.UTF-8
[18995 refs]
mfabian@ari:/local/mfabian/src/cpython (2.7-mike %)
$ 

The patch also contains a small fix for the “ks” and “sd”
locales in the locale_alias dictionary, they had the “.UTF-8”
in the wrong place:

-'ks_in@devanagari': 'ks...@devanagari.utf-8',
+'ks_in@devanagari': 'ks_IN.UTF-8@devanagari',

-'sd':   'sd...@devanagari.utf-8',
+'sd':   'sd_IN.UTF-8@devanagari',

(This error is inherited from the locale.alias file from X.org
where the locale_alias dictionary is generated from)

--
keywords: +patch
Added file: 
http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19534
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin

2013-11-09 Thread Mike FABIAN

Mike FABIAN added the comment:

The patch

http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch

is against the current HEAD of the 2.7 branch, but
Python 3.3 has exactly the same problem, the same patch fixes it for python
3.3 as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19534
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com