[ python-Bugs-1519069 ] incorrect locale.strcoll() return in Windows

SourceForge.net Mon, 17 Jul 2006 19:52:50 -0700

Bugs item #1519069, was opened at 2006-07-07 22:04
Message generated for change (Comment added) made by pez4brian
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1519069&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Windows
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Matherly (pez4brian)
Assigned to: Nobody/Anonymous (nobody)
Summary: incorrect locale.strcoll() return in Windows

Initial Comment:
Python 2.4.2 in Windows (English locale):

>>> import locale
>>> locale.setlocale(locale.LC_ALL,'C')
'C'
>>> locale.setlocale(locale.LC_ALL,'')
'English_United States.1252'
>>> locale.strcoll("M","m")
1
>>> locale.strcoll("Ma","mz")
-1

It appears that when a string has one character, "M" is
greater than "m", but when it has more than one string,
"M" is equal to "m"

----------------------------------------------------------------------

>Comment By: Brian Matherly (pez4brian)
Date: 2006-07-17 21:52

Message:
Logged In: YES 
user_id=726294

I think you are right - it's probably a Windows issue - if
it is an issue at all. I don't claim to be a lingual expert.
But I would prefer a case sensitive comparison. So I wrote a
function. It looks like this:

def strcoll_case_sensitive(string1,string2):
    """ This function was written because string comparisons
in Windows 
        seem to be case insensitive if the string is longer
than 
        one character. """
    # First, compare the first character
    diff = locale.strcoll(string1[0],string2[0])
    if diff == 0:
        # If the first character is the same, compare the rest
        diff = locale.strcoll(string1,string2)
    return diff

Thanks for your help. Feel free to close this bug.

----------------------------------------------------------------------

Comment By: Martin v. LÃ¶wis (loewis)
Date: 2006-07-15 02:37

Message:
Logged In: YES 
user_id=21627

You should ask these questions in some Win32 programmer
newsgroup. I don't know whether this sorting is correct or
not, I'm not a native English speaker.

----------------------------------------------------------------------

Comment By: Brian Matherly (pez4brian)
Date: 2006-07-14 22:14

Message:
Logged In: YES 
user_id=726294

Thanks for your response. That is simply unacceptable. Who
at Microsoft needs to be flogged? More likely, this shows my
lack of understanding of strings and locale in general.

Your explanation does explain the results I get, but
wouldn't you admit that the results *seem* wrong?

By the definition given, the strings "Ma", "mb", "Mc", "md"
would actually sort in that order! So the list of sorted
strings would have alternating capitalization!

However, the list of strings "M", "m", "M", "m" would sort
as "M", "M", "m", "m" - no alternating capitalization - as I
would expect.

Would there happen to be some way to sort the strings using
the locale, but also using the case earlier in the
computation order? Basically, I want the sort to be case
sensitive.

Thanks again for your response. If you have any suggestions
that might help me achieve what I want, it would be greatly
appreciated.

----------------------------------------------------------------------

Comment By: Martin v. LÃ¶wis (loewis)
Date: 2006-07-14 10:55

Message:
Logged In: YES 
user_id=21627

Why do you think this is a bug? We pass the string as-is to
the C library, which passes it nearly as-is to
CompareStringW. This function then decides how they collate;
in Microsoft's definition of the English_United States
locale, these strings do have the order you get.

In case you wonder how the order is computed: essentially,
the strings are compared case insensitive, without
diacritics. If they then compare equal, the diacritics are
considered. If this still compares equal, Case weights are
considered. If this still compares equal, Special weights
are considered.

(Note: I obtained this indirectly by looking at the
LCMapString documentation, assuming that CompareString uses
LCMapString with LCMAP_SORTKEY|SORT_STRINGSORT).

----------------------------------------------------------------------

Comment By: Brian Matherly (pez4brian)
Date: 2006-07-07 22:35

Message:
Logged In: YES 
user_id=726294

I see the same problem in python 2.4.3

----------------------------------------------------------------------

Comment By: Brian Matherly (pez4brian)
Date: 2006-07-07 22:08

Message:
Logged In: YES 
user_id=726294

Correction:

It appears that when a string has one character, "M" is
greater than "m", but when it has more than one character,
"M" is equal to "m"

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1519069&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-1519069 ] incorrect locale.strcoll() return in Windows

Reply via email to