Python 3.3, gettext and Unicode problems

Marcel Rodrigues Sun, 30 Dec 2012 16:42:43 -0800

I'm using Python 3.3 (CPython) and am having trouble getting the standard
gettext module to handle Unicode messages.
My problem can be isolated as follows:


I have 3 files in a folder: greeting.py, greeting.po and msgfmt.py.

-- greeting.py --
import gettext

t = gettext.translation("greeting", "locale", ["pt"])
_ = t.lgettext

print("_charset = {0}\n".format(t._charset))
print(_("hello"))
-- EOF --

-- greeting.po --
msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "hello"
msgstr "olá"
-- EOF --

msgfmt.py was downloaded from
http://hg.python.org/cpython/file/9e6ead98762e/Tools/i18n/msgfmt.py, since
this tool apparently isn't included in the python3 package available on
Arch Linux official repositories.

It's probably also worth noting that the file greeting.po is encoded itself
as UTF-8.

>From that folder, I run the following commands:

$ mkdir -p locale/pt/LC_MESSAGES
$ python msgfmt.py -o !$/greeting.mo greeting.po
$ python greeting.py

The output is:
_charset = UTF-8

Traceback (most recent call last):
  File "greeting.py", line 7, in <module>
    print(_("hello"))
  File "/usr/lib/python3.3/gettext.py", line 314, in lgettext
    return tmsg.encode(locale.getpreferredencoding())
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position
2: ordinal not in range(128)

My interpretation of this output is that even though gettext correctly
detects the MO file charset as UTF-8, it tries to encode the translated
message with the system's "preferred encoding", which happens to be ASCII.

Anyone know why this happens? Is this a bug on my code? Maybe I have
misunderstood gettext...

Thanks,

  Marcel

-- 
http://mail.python.org/mailman/listinfo/python-list

Python 3.3, gettext and Unicode problems

Reply via email to