Jean-Paul Calderone wrote: > On Fri, 24 Mar 2006 09:33:19 +1100, John Machin <[EMAIL PROTECTED]> wrote: > >On 24/03/2006 8:36 AM, Peter Otten wrote: > >> John Machin wrote: > >> > >>>You can replace ALL of this upshifting and accent removal in one blow by > >>>using the string translate() method with a suitable table. > >> > >> Only if you convert to unicode first or if your data maintains 1 byte == 1 > >> character, in particular it is not UTF-8. > >> > > > >I'm sorry, I forgot that there were people who are unaware that > >variable-length gizmos like UTF-8 and various legacy CJK encodings are > >for storage & transmission, and are better changed to a > >one-character-per-storage-unit representation before *ANY* data > >processing is attempted. > > Unfortunately, unicode only appears to solve this problem in a sane manner.
What problem do you mean? Loose matching is solved by unicode in a sane manner, it is described in the unicode collation algorithm. Serge. -- http://mail.python.org/mailman/listinfo/python-list