[issue5815] locale.getdefaultlocale() missing corner case
Mike FABIAN added the comment: Serhiy While normalize can return sd...@devanagari.utf-8, _parse_localename() Serhiy should be able correctly parse it. But if normalize returns sd...@devanagari.utf-8, isn’t that quite useless because it is a locale name which does not actually work in glibc? Serhiy Removing sd...@devanagari.utf-8 from alias table is another issue. Yes. I think it should be fixed in the alias table as well. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5815 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5815] locale.getdefaultlocale() missing corner case
Mike FABIAN added the comment: Serhiy, in your patch you seem to have special treatment for the devanagari modifier: +# Devanagari modifier placed before encoding. +return code, modifier.split('.')[1] Probably because of 'ks_in@devanagari': 'ks...@devanagari.utf-8', 'sd': 'sd...@devanagari.utf-8', in the locale_alias dictionary. But I think these two lines are just wrong, this mistake is inherited from the locale.alias from X.org where the python locale_alias comes from. glibc: mfabian@ari:~ $ locale -a | grep ^sd sd_IN sd_IN.utf8 sd_IN.utf8@devanagari sd_IN@devanagari mfabian@ari:~ $ locale -a | grep ^ks ks_IN ks_IN.utf8 ks_IN.utf8@devanagari ks_IN@devanagari mfabian@ari:~ $ The encoding should always be *before* the modifier. -- nosy: +mfabian ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5815 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5815] locale.getdefaultlocale() missing corner case
Mike FABIAN added the comment: Serhiy The /usr/share/X11/locale/locale.alias file in Ubuntu 12.04 LTS Serhiy contains ks...@devanagari.utf-8 and sd...@devanagari.utf-8 Serhiy entities. Yes, I know, that’s why I wrote that the Python code inherited this mistake from X.org. Serhiy While the encoding is expected to be before the modifier, if Serhiy there are systems with ks...@devanagari.utf-8 or Serhiy sd...@devanagari.utf-8 locales we should support these weird case. There are no such systems really, in X.org this is just a mistake. glibc doesn’t write it like this and it is agains the specification here: http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html#tag_002 [language[_territory][.codeset][@modifier]] -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5815 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5815] locale.getdefaultlocale() missing corner case
Mike FABIAN added the comment: In glibc, sd...@devanagari.utf-8 is an invalid locale name, only sd_IN.UTF-8@devanagari is valid: mfabian@ari:~ $ LC_ALL=sd_IN.UTF-8@devanagari locale charmap UTF-8 mfabian@ari:~ $ LC_ALL=sd...@devanagari.utf-8 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 mfabian@ari:~ $ So I think this should be fixed in X.org. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5815 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin
New submission from Mike FABIAN: Originally reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1024667 I found that Serbian translations in Latin do not work when the locale name is written as sr_RS.UTF-8@latin (one gets the cyrillic translations instead), but they *do* work when the locale name is written as sr_RS@latin (i.e. omitting the '.UTF-8'): $ LANG='sr_RS.UTF-8' python2 -c 'import gettext; print(gettext.ldgettext(anaconda, What language would you like to use during the installation process?).decode(UTF-8))' Који језик бисте желели да користите током процеса инсталације? mfabian@ari:~ $ LANG='sr_RS.UTF-8@latin' python2 -c 'import gettext; print(gettext.ldgettext(anaconda, What language would you like to use during the installation process?).decode(UTF-8))' Који језик бисте желели да користите током процеса инсталације? mfabian@ari:~ $ LANG='sr_RS@latin' python2 -c 'import gettext; print(gettext.ldgettext(anaconda, What language would you like to use during the installation process?).decode(UTF-8))' Koji jezik biste želeli da koristite tokom procesa instalacije? mfabian@ari:~ $ The “gettext” command line tool does not have this problem: mfabian@ari:~ $ LANG='sr_RS@latin' gettext anaconda What language would you like to use during the installation process? Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~ $ LANG='sr_RS.UTF-8@latin' gettext anaconda What language would you like to use during the installation process? Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~ $ LANG='sr_RS.UTF-8' gettext anaconda What language would you like to use during the installation process? Који језик бисте желели да користите током процеса инсталације?mfabian@ari:~ $ -- components: Library (Lib) messages: 202467 nosy: mfabian priority: normal severity: normal status: open title: normalize() in locale.py fails for sr_RS.UTF-8@latin versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19534 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin
Mike FABIAN added the comment: The problem turns out to be caused by a problem in normalizing the locale name, see the output of this test program: mfabian@ari:~ $ cat ~/tmp/mike-test.py #!/usr/bin/python2 import sys import os import locale import encodings import encodings.aliases test_locales = [ 'ja_JP.UTF-8', 'de_DE.SJIS', 'de_DE.foobar', 'sr_RS.UTF-8@latin', 'sr_rs@latin', 'sr@latin', 'sr_yu', 'sr_yu.SJIS@devanagari', 'sr@foobar', 'sR@foObar', 'sR', ] for test_locale in test_locales: print(%(orig)s - %(norm)s %{'orig': test_locale, 'norm': locale.normalize(test_locale)} ) mfabian@ari:~ $ python2 ~/tmp/mike-test.py ja_JP.UTF-8 - ja_JP.UTF-8 de_DE.SJIS - de_DE.SJIS de_DE.foobar - de_DE.foobar sr_RS.UTF-8@latin - sr_RS.utf_8_latin sr_rs@latin - sr_RS.UTF-8@latin sr@latin - sr_RS.UTF-8@latin sr_yu - sr_RS.UTF-8@latin sr_yu.SJIS@devanagari - sr_RS.sjis_devanagari sr@foobar - sr@foobar sR@foObar - sR@foObar sR - sr_RS.UTF-8 mfabian@ari:~ $ I.e. “sr_RS.UTF-8@latin” is normalized to “sr_RS.utf_8_latin” which is clearly wrong and causes a fallback to sr_RS when using gettext which gives the cyrillic translations. -- Added file: http://bugs.python.org/file32551/mike-test.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19534 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin
Mike FABIAN added the comment: A simple fix for that problem could look like this: mfabian@ari:~ $ diff -u /usr/lib64/python2.7/locale.py.orig /usr/lib64/python2.7/locale.py --- /usr/lib64/python2.7/locale.py.orig 2013-11-09 09:08:24.807331535 +0100 +++ /usr/lib64/python2.7/locale.py 2013-11-09 09:08:34.526390646 +0100 @@ -377,7 +377,7 @@ # First lookup: fullname (possibly with encoding) norm_encoding = encoding.replace('-', '') norm_encoding = norm_encoding.replace('_', '') -lookup_name = langname + '.' + encoding +lookup_name = langname + '.' + norm_encoding code = locale_alias.get(lookup_name, None) if code is not None: return code @@ -1457,6 +1457,7 @@ 'sr_cs@latn': 'sr_RS.UTF-8@latin', 'sr_me':'sr_ME.UTF-8', 'sr_rs':'sr_RS.UTF-8', +'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin', 'sr_rs.utf8@latn': 'sr_RS.UTF-8@latin', 'sr_rs@latin': 'sr_RS.UTF-8@latin', 'sr_rs@latn': 'sr_RS.UTF-8@latin', mfabian@ari:~ $ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19534 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin
Mike FABIAN added the comment: in locale.py, the comment above “locale_alias = {” says: # Note that the normalize() function which uses this tables # removes '_' and '-' characters from the encoding part of the # locale name before doing the lookup. This saves a lot of # space in the table. But in normalize(), this is actually not done: # First lookup: fullname (possibly with encoding) norm_encoding = encoding.replace('-', '') norm_encoding = norm_encoding.replace('_', '') lookup_name = langname + '.' + encoding code = locale_alias.get(lookup_name, None) “norm_encoding” holds the locale name with these replacements, but then it is not used in the lookup. The patch in http://bugs.python.org/msg202469 fixes that, using the norm_encoding together with adding the alias +'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin', makes it work for sr_RS.UTF-8@latin, my test program then outputs: mfabian@ari:~ $ python2 ~/tmp/mike-test.py ja_JP.UTF-8 - ja_JP.UTF-8 de_DE.SJIS - de_DE.SJIS de_DE.foobar - de_DE.foobar sr_RS.UTF-8@latin - sr_RS.UTF-8@latin sr_rs@latin - sr_RS.UTF-8@latin sr@latin - sr_RS.UTF-8@latin sr_yu - sr_RS.UTF-8@latin sr_yu.SJIS@devanagari - sr_RS.sjis_devanagari sr@foobar - sr@foobar sR@foObar - sR@foObar sR - sr_RS.UTF-8 mfabian@ari:~ $ But note that the normalization of the “sr_yu.SJIS@devanagari” locale is still weird (of course a “sr_yu.SJIS@devanagari” is quite silly and does not exist anyway, but the code in normalize() does not seem to work as intended. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19534 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin
Mike FABIAN added the comment: I think the patch I attach here is a better fix than the patch in http://bugs.python.org/msg202469 because it makes the normalize() function behave more logical overall, with this patch, my test program prints: mfabian@ari:/local/mfabian/src/cpython (2.7-mike %) $ ./python ~/tmp/mike-test.py ja_JP.UTF-8 - ja_JP.UTF-8 de_DE.SJIS - de_DE.SJIS de_DE.foobar - de_DE.foobar sr_RS.UTF-8@latin - sr_RS.UTF-8@latin sr_rs@latin - sr_RS.UTF-8@latin sr@latin - sr_RS.UTF-8@latin sr_yu - sr_RS.UTF-8@latin sr_yu.SJIS@devanagari - sr_RS.SJIS@devanagari sr@foobar - sr_RS.UTF-8@foobar sR@foObar - sr_RS.UTF-8@foobar sR - sr_RS.UTF-8 [18995 refs] mfabian@ari:/local/mfabian/src/cpython (2.7-mike %) $ The patch also contains a small fix for the “ks” and “sd” locales in the locale_alias dictionary, they had the “.UTF-8” in the wrong place: -'ks_in@devanagari': 'ks...@devanagari.utf-8', +'ks_in@devanagari': 'ks_IN.UTF-8@devanagari', -'sd': 'sd...@devanagari.utf-8', +'sd': 'sd_IN.UTF-8@devanagari', (This error is inherited from the locale.alias file from X.org where the locale_alias dictionary is generated from) -- keywords: +patch Added file: http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19534 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19534] normalize() in locale.py fails for sr_RS.UTF-8@latin
Mike FABIAN added the comment: The patch http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch is against the current HEAD of the 2.7 branch, but Python 3.3 has exactly the same problem, the same patch fixes it for python 3.3 as well. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19534 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com