Clodoaldo wrote: > When using unicode the case change works: > >>>> print u'É'.lower() > é > > But when using the pt_BR.utf-8 locale it doesn't: > >>>> locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8') > 'pt_BR.utf-8' >>>> locale.getlocale() > ('pt_BR', 'utf') >>>> print 'É'.lower() > É > > What am I missing? I'm in Fedora Core 5 and Python 2.4.3. > > # cat /etc/sysconfig/i18n > LANG="en_US.UTF-8" > SYSFONT="latarcyrheb-sun16" > > Regards, Clodoaldo Pinto Neto
str.lower() operates on bytes and therefore doesn't handle encodings with multibyte characters (like utf-8) properly: >>> u"É".encode("utf8") '\xc3\x89' >>> u"É".encode("latin1") '\xc9' >>> import locale >>> locale.setlocale(locale.LC_ALL, "de_DE.utf8") 'de_DE.utf8' >>> print unicode("\xc3\x89".lower(), "utf8") É >>> locale.setlocale(locale.LC_ALL, "de_DE.latin1") 'de_DE.latin1' >>> print unicode("\xc9".lower(), "latin1") é I recommend that you forget about byte strings and use unicode throughout. Peter -- http://mail.python.org/mailman/listinfo/python-list