Clodoaldo wrote:

> When using unicode the case change works:
> 
>>>> print u'É'.lower()
> é
> 
> But when using the pt_BR.utf-8 locale it doesn't:
> 
>>>> locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8')
> 'pt_BR.utf-8'
>>>> locale.getlocale()
> ('pt_BR', 'utf')
>>>> print 'É'.lower()
> É
> 
> What am I missing? I'm in Fedora Core 5 and Python 2.4.3.
> 
> # cat /etc/sysconfig/i18n
> LANG="en_US.UTF-8"
> SYSFONT="latarcyrheb-sun16"
> 
> Regards, Clodoaldo Pinto Neto

str.lower() operates on bytes and therefore doesn't handle encodings with
multibyte characters (like utf-8) properly:

>>> u"É".encode("utf8")
'\xc3\x89'
>>> u"É".encode("latin1")
'\xc9'
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "de_DE.utf8")
'de_DE.utf8'
>>> print unicode("\xc3\x89".lower(), "utf8")
É
>>> locale.setlocale(locale.LC_ALL, "de_DE.latin1")
'de_DE.latin1'
>>> print unicode("\xc9".lower(), "latin1")
é

I recommend that you forget about byte strings and use unicode throughout.

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to