Bugs item #1193061, was opened at 2005-04-30 20:37 Message generated for change (Comment added) made by caglar You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1193061&group_id=5470
Category: Unicode Group: None Status: Open Resolution: None Priority: 5 Submitted By: S.Çağlar Onur (caglar) Assigned to: M.-A. Lemburg (lemburg) Summary: Python and Turkish Locale Initial Comment: On behalf of this thread; http://mail.python.org/pipermail/python-dev/2005-April/052968.html As described in http://www.i18nguy.com/unicode/turkish-i18n.html [ How Applications Fail With Turkish Language ] , Turkish has 4 "i" in their alphabet. Without --with-wctype-functions support Python convert these characters locare-independent manner in tr_TR.UTF-8 locale. So all conversitons maps to "i" or "I" which is wrong in Turkish locale. So if Python Developers will remove the wctype functions from Python, then there must be a locale-dependent upper/lower funtion to handle these characters properly. ---------------------------------------------------------------------- >Comment By: S.Çağlar Onur (caglar) Date: 2005-05-02 11:45 Message: Logged In: YES user_id=858447 No, im not. These rules defined in http://www.unicode.org/Public/UNIDATA/CaseFolding.txt and http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt. Note that there is a comments says; # T: special case for uppercase I and dotted uppercase I # - For non-Turkic languages, this mapping is normally not used. # - For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters. # Note that the Turkic mappings do not maintain canonical equivalence without additional processing. # See the discussions of case mapping in the Unicode Standard for more information. So without wctype functions support, python can't convert these. This _is_ the problem. As a side effect of this, another huge problem occurs, keywords can't be locale dependent. If Python compiled with wctype support functions, all "i".upper() turns into "İ" which is wrong for keyword comparision ( like quit v.s QUİT ) So i suggest implement two new functions like localeAwareLower()/localeAwareUpper() for python and let lower()/upper() locale independent. And as you wrote locale module may be a perfect home for these :) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2005-05-02 11:00 Message: Logged In: YES user_id=38388 I'm not sure I understand: are you saying that the Unicode mappings for upper and lower case are wrong in the standard ? Note that removing the wctype functions will only remove the possibility to use these functions for case mapping of Unicode characters instead of using the builtin Unicode character database. This was originally meant as optimization to avoid having to load the Unicode database - nowadays the database is always included, so the optimization is no longer needed. Even worse: the wctype functions sometimes behave differently than the mappings in the Unicode database (due to differences in the Unicode database version or implementation s). Now, since the string .lower() and .upper() methods are locale dependent (due to their reliance on the C functions toupper() and tolower() - not by intent), while the Unicode versions are not, we have a rather annoying situation where switching from strings to Unicode cause semantic differences. Ideally, both string and Unicode methods should do case mapping in an locale independent way. The support for differences in locale dependent case mapping, collation, etc. should be moved to an external module, e.g. the locale module. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1193061&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com