Bugs item #1519069, was opened at 2006-07-07 22:04 Message generated for change (Comment added) made by pez4brian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1519069&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Windows Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Brian Matherly (pez4brian) Assigned to: Nobody/Anonymous (nobody) Summary: incorrect locale.strcoll() return in Windows Initial Comment: Python 2.4.2 in Windows (English locale): >>> import locale >>> locale.setlocale(locale.LC_ALL,'C') 'C' >>> locale.setlocale(locale.LC_ALL,'') 'English_United States.1252' >>> locale.strcoll("M","m") 1 >>> locale.strcoll("Ma","mz") -1 It appears that when a string has one character, "M" is greater than "m", but when it has more than one string, "M" is equal to "m" ---------------------------------------------------------------------- >Comment By: Brian Matherly (pez4brian) Date: 2006-07-17 21:52 Message: Logged In: YES user_id=726294 I think you are right - it's probably a Windows issue - if it is an issue at all. I don't claim to be a lingual expert. But I would prefer a case sensitive comparison. So I wrote a function. It looks like this: def strcoll_case_sensitive(string1,string2): """ This function was written because string comparisons in Windows seem to be case insensitive if the string is longer than one character. """ # First, compare the first character diff = locale.strcoll(string1[0],string2[0]) if diff == 0: # If the first character is the same, compare the rest diff = locale.strcoll(string1,string2) return diff Thanks for your help. Feel free to close this bug. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-07-15 02:37 Message: Logged In: YES user_id=21627 You should ask these questions in some Win32 programmer newsgroup. I don't know whether this sorting is correct or not, I'm not a native English speaker. ---------------------------------------------------------------------- Comment By: Brian Matherly (pez4brian) Date: 2006-07-14 22:14 Message: Logged In: YES user_id=726294 Thanks for your response. That is simply unacceptable. Who at Microsoft needs to be flogged? More likely, this shows my lack of understanding of strings and locale in general. Your explanation does explain the results I get, but wouldn't you admit that the results *seem* wrong? By the definition given, the strings "Ma", "mb", "Mc", "md" would actually sort in that order! So the list of sorted strings would have alternating capitalization! However, the list of strings "M", "m", "M", "m" would sort as "M", "M", "m", "m" - no alternating capitalization - as I would expect. Would there happen to be some way to sort the strings using the locale, but also using the case earlier in the computation order? Basically, I want the sort to be case sensitive. Thanks again for your response. If you have any suggestions that might help me achieve what I want, it would be greatly appreciated. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-07-14 10:55 Message: Logged In: YES user_id=21627 Why do you think this is a bug? We pass the string as-is to the C library, which passes it nearly as-is to CompareStringW. This function then decides how they collate; in Microsoft's definition of the English_United States locale, these strings do have the order you get. In case you wonder how the order is computed: essentially, the strings are compared case insensitive, without diacritics. If they then compare equal, the diacritics are considered. If this still compares equal, Case weights are considered. If this still compares equal, Special weights are considered. (Note: I obtained this indirectly by looking at the LCMapString documentation, assuming that CompareString uses LCMapString with LCMAP_SORTKEY|SORT_STRINGSORT). ---------------------------------------------------------------------- Comment By: Brian Matherly (pez4brian) Date: 2006-07-07 22:35 Message: Logged In: YES user_id=726294 I see the same problem in python 2.4.3 ---------------------------------------------------------------------- Comment By: Brian Matherly (pez4brian) Date: 2006-07-07 22:08 Message: Logged In: YES user_id=726294 Correction: It appears that when a string has one character, "M" is greater than "m", but when it has more than one character, "M" is equal to "m" ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1519069&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com