I'm using Python 3.3 (CPython) and am having trouble getting the standard gettext module to handle Unicode messages. My problem can be isolated as follows:
I have 3 files in a folder: greeting.py, greeting.po and msgfmt.py. -- greeting.py -- import gettext t = gettext.translation("greeting", "locale", ["pt"]) _ = t.lgettext print("_charset = {0}\n".format(t._charset)) print(_("hello")) -- EOF -- -- greeting.po -- msgid "" msgstr "" "Project-Id-Version: 1.0\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" msgid "hello" msgstr "olá" -- EOF -- msgfmt.py was downloaded from http://hg.python.org/cpython/file/9e6ead98762e/Tools/i18n/msgfmt.py, since this tool apparently isn't included in the python3 package available on Arch Linux official repositories. It's probably also worth noting that the file greeting.po is encoded itself as UTF-8. >From that folder, I run the following commands: $ mkdir -p locale/pt/LC_MESSAGES $ python msgfmt.py -o !$/greeting.mo greeting.po $ python greeting.py The output is: _charset = UTF-8 Traceback (most recent call last): File "greeting.py", line 7, in <module> print(_("hello")) File "/usr/lib/python3.3/gettext.py", line 314, in lgettext return tmsg.encode(locale.getpreferredencoding()) UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 2: ordinal not in range(128) My interpretation of this output is that even though gettext correctly detects the MO file charset as UTF-8, it tries to encode the translated message with the system's "preferred encoding", which happens to be ASCII. Anyone know why this happens? Is this a bug on my code? Maybe I have misunderstood gettext... Thanks, Marcel
-- http://mail.python.org/mailman/listinfo/python-list