Marc-Andre Lemburg <m...@egenix.com> added the comment: On 2008-12-20 17:19, Alex Stapleton wrote: > Alex Stapleton <al...@prol.etari.at> added the comment: > > I am trying to get a PEP together for this. Does anyone have any thoughts > on how to handle comparison between unicode strings in a locale aware > situation?
Some thoughts: * the Unicode implementation *must* stay locale independent * we should implement the Unicode collation algorithm (TR#10, http://unicode.org/reports/tr10/) * which collation to use should be a parameter of a function or object initializer and it should be possible to use multiple collations in the same application (without switching the locale) * the terms "locale" and "collation" should not be mixed; a (default) collation is a property of a locale and there can also be more than one collation per locale The Unicode collation algorithm defines collation in terms of a key function for each collation, so that already fits nicely with the key function parameter of list.sort(). > Should __lt__ and __gt__ be specified as ignoring locale? In which case do > we need to add a new method for doing locale aware comparisons? Unicode strings should not get any locale or collation specific methods. Instead this feature should be implemented elsewhere and the strings in question passed to this new function or object. > Should locale be a property of the string, an argument passed to > upper/lower/isupper/islower/swapcase/capitalize/sort or global state > (locale module...)? No. See above. > Should doing a locale aware comparison of two strings with different > locales throw an exception? No, assigning locales to strings is not going to work and we should not go down that road. It's better to have locale aware functions for certain operations, so that you can pass your Unicode strings to these function instead of binding additional context information to the Unicode strings themselves. > Should locales be represented as objects or just a string like "en_GB"? I think the easiest way to get the collation algorithm implemented is by using a similar scheme as for codecs: you pass a collation name to a central function and get back a collation object that implements the collation in form of a key method and a compare method. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4610> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com