New submission from Peter Landgren <peter.tal...@telia.com>: If any of the Swedish characters "åäöÅÄÖ" are input to unicode.normalize(form, ustr) with form = "NFD" or "NFKD" the result will be "aaoAAO". "åäöÅÄÖ" are normal character and should be the same after normalize. They are not connected to aaoAAO other than for historic reasons, but not in modern languages. It's a common misinterpretation that the dots and circle above them are diacritic signs, but those letters should behave as the (Danish) "Ø" which is normalized correctly.
From Wikipedia: Å is often perceived as an A with a ring, interpreting the ring as a diacritical mark. However, in the languages that use it, the ring is not considered a diacritic but part of the letter. The letter Ö in the Swedish and Icelandic alphabets historically arises from the Germanic umlaut, but it is considered a separate letter from O. See http://en.wikipedia.org/wiki/%C3%85 I think this is pobably impossible to solve as it will be mixed up with "umlaut" and you don't know what language the specific word is connected to. ---------- components: Library (Lib) messages: 81536 nosy: PeterL severity: normal status: open title: unicode.normalize gives wrong result for some characters type: behavior versions: Python 2.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5200> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com