On May 24, 6:40 am, Peter Otten <[EMAIL PROTECTED]> wrote: > Clodoaldo wrote: > > When using unicode the case change works: > > >>>> print u'É'.lower() > > é > > > But when using the pt_BR.utf-8 locale it doesn't: > > >>>> locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8') > > 'pt_BR.utf-8' > >>>> locale.getlocale() > > ('pt_BR', 'utf') > >>>> print 'É'.lower() > > É > > > What am I missing? I'm in Fedora Core 5 and Python 2.4.3. > > > # cat /etc/sysconfig/i18n > > LANG="en_US.UTF-8" > > SYSFONT="latarcyrheb-sun16" > > > Regards, Clodoaldo Pinto Neto > > str.lower() operates on bytes and therefore doesn't handle encodings with > multibyte characters (like utf-8) properly: > > >>> u"É".encode("utf8") > '\xc3\x89' > >>> u"É".encode("latin1") > '\xc9' > >>> import locale > >>> locale.setlocale(locale.LC_ALL, "de_DE.utf8") > 'de_DE.utf8' > >>> print unicode("\xc3\x89".lower(), "utf8") > É > >>> locale.setlocale(locale.LC_ALL, "de_DE.latin1") > 'de_DE.latin1' > >>> print unicode("\xc9".lower(), "latin1") > > é > > I recommend that you forget about byte strings and use unicode throughout.
Now I understand it. Thanks. Regards, Clodoaldo Pinto Neto -- http://mail.python.org/mailman/listinfo/python-list