Re: Python 3.3, gettext and Unicode problems

Terry Reedy Sun, 30 Dec 2012 18:30:18 -0800

On 12/30/2012 8:48 PM, Terry Reedy wrote:

On 12/30/2012 7:39 PM, Marcel Rodrigues wrote:

I'm using Python 3.3 (CPython) and am having trouble getting the
standard gettext module to handle Unicode messages.


Addition to previous response.

import gettext

t = gettext.translation("greeting", "locale", ["pt"])


Reading further, I see that this returns a GNUTranslations instance

_ = t.lgettext


So this calls its method:
'''
GNUTranslations.gettext(message)

Look up the message id in the catalog and return the correspondingmessage string, as a Unicode string. If there is no entry in the catalogfor the message id, and a fallback has been set, the look up isforwarded to the fallback’s gettext() method. Otherwise, the message idis returned.


GNUTranslations.lgettext(message)

Equivalent to gettext(), but the translation is returned as a bytestringencoded in the selected output charset, or in the preferred systemencoding if no encoding was explicitly set with set_output_charset().

'''

So if you want the unicode translation to be utf-8 encoded, either use.gettext and encode it yourself, or use "t.set_output_charset('utf-8')"to have it done automatically.


>> print("_charset = {0}\n".format(t._charset))
>> print(_("hello"))

But since you are printing to screen, I suggest using .gettext and letprint do the encoding to the screen encoding. If that still raises anencoding error, then the problem is the console emulator. On windows,for instance, IDLE windows handle the entire BMP charset while thestupid Windows Command Prompt window does not (certainly not by default,and not yet, as far I know).

The encoding of the translations file on disk determines how the bytesof the translation table should be *decoded when read, to create unicodestrings. It does not determine how those strings should be *encoded*when sent to a particular destination. That may depend on thedestination. Multilingual international sites used to encode pages indifferent limited national encodings, according to the language anddestination. Now many encode and send *everything* as utf-8. I thinkthis is the proper policy now. .lgettext seems oriented to the older,pre utf-8, national locale encoding way of doing things.


--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.3, gettext and Unicode problems

Reply via email to