On 4/19/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> On 4/19/07, Jason Orendorff <[EMAIL PROTECTED]> wrote:
> > Collation can be done right: provide a function text.sort_key()
> > that converts a str into an opaque thing that has the desired
> > ordering.
>
> If this function is context-free (depending on only the input string),
> it will violate the unicode standard (and, apparently, do the wrong
> thing for some languages -- usually including French).

I meant this to be a function of the string and the locale,[*]
implemented as a thin wrapper around wcsxfrm() on posix, LCMapString()
on Win32, Collator.getCollationKey() in Java, CompareInfo.GetSortKey()
in .NET.  Whether these are Unicode-compliant is out of our hands.

We're not Sun or IBM.  I don't think we're going to implement and
maintain this ourselves.  So I see two options: (1) swallow a hard
dependency on a particular implementation, maybe ICU; (2) use whatever
the system happens to provide.  Either one is fine with me.

> I'm not saying relying strictly on unicode properties *can't* be done
> right -- I'm just saying that it would be very difficult and very
> inefficient, so it probably won't happen soon -- which is an argument
> for keeping the half-measures around meanwhile.

This would be true if the only option were to implement it all
ourselves.

-j

[*] I would prefer a context-free function really takes both the
string and the locale as arguments... but the posix API doesn't
support that.  :-P
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to