Re: Python 3.3, gettext and Unicode problems

Terry Reedy Sun, 30 Dec 2012 17:52:01 -0800

On 12/30/2012 7:39 PM, Marcel Rodrigues wrote:

I'm using Python 3.3 (CPython) and am having trouble getting the
standard gettext module to handle Unicode messages.


I have never even looked at the doc before, but I will take a look.

My problem can be isolated as follows:

I have 3 files in a folder: greeting.py, greeting.po and msgfmt.py.

-- greeting.py --
import gettext

t = gettext.translation("greeting", "locale", ["pt"])
_ = t.lgettext


gettext.lgettext(message)

Equivalent to gettext(), but the translation is returned in thepreferred system encoding, if no other encoding was explicitly set withbind_textdomain_codeset().

Giving that 'preferred system encoding' apparent means'locale.getpreferredencoding' and that seems to not be what you want,why are you using the 'l' version?


print("_charset = {0}\n".format(t._charset))
print(_("hello"))

A strong suggestion: whenever you want to print a string and thecomputation of the string (or bytes) involves encoding/decoding,separate the computation and the printing (on two separate line).


s = _("hello")
print(s)

The reason is that printing also requires encoding for the output deviceand that process can also generate a UnicodeError that may be hard todistinguish from an error in the computation of s itself.

-- EOF --

-- greeting.po --
msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "hello"
msgstr "olá"
-- EOF --

msgfmt.py was downloaded from
http://hg.python.org/cpython/file/9e6ead98762e/Tools/i18n/msgfmt.py,
since this tool apparently isn't included in the python3 package
available on Arch Linux official repositories.

It's probably also worth noting that the file greeting.po is encoded
itself as UTF-8.

 From that folder, I run the following commands:

$ mkdir -p locale/pt/LC_MESSAGES
$ python msgfmt.py -o !$/greeting.mo greeting.po
$ python greeting.py

The output is:
_charset = UTF-8

Traceback (most recent call last):
   File "greeting.py", line 7, in <module>
     print(_("hello"))
   File "/usr/lib/python3.3/gettext.py", line 314, in lgettext
     return tmsg.encode(locale.getpreferredencoding())
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in
position 2: ordinal not in range(128)

In particular, we have seen, in previous posts here, this exact errorgenerated during printing rather than during the string computation andposters have wasted time looking for the error in the string or bytescomputation itself.

My interpretation of this output is that even though gettext correctly
detects the MO file charset as UTF-8, it tries to encode the translated
message with the system's "preferred encoding", which happens to be ASCII.


Just as you seem to have requested ;-)

Anyone know why this happens? Is this a bug on my code? Maybe I have
misunderstood gettext...


You used lgettext (l = locale). As I said, I am new to this.

--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.3, gettext and Unicode problems

Reply via email to