On 12/30/2012 8:48 PM, Terry Reedy wrote:
On 12/30/2012 7:39 PM, Marcel Rodrigues wrote:
I'm using Python 3.3 (CPython) and am having trouble getting the
standard gettext module to handle Unicode messages.

Addition to previous response.

import gettext

t = gettext.translation("greeting", "locale", ["pt"])

Reading further, I see that this returns a GNUTranslations instance

_ = t.lgettext

So this calls its method:
'''
GNUTranslations.gettext(message)
Look up the message id in the catalog and return the corresponding message string, as a Unicode string. If there is no entry in the catalog for the message id, and a fallback has been set, the look up is forwarded to the fallback’s gettext() method. Otherwise, the message id is returned.

GNUTranslations.lgettext(message)
Equivalent to gettext(), but the translation is returned as a bytestring encoded in the selected output charset, or in the preferred system encoding if no encoding was explicitly set with set_output_charset().
'''
So if you want the unicode translation to be utf-8 encoded, either use .gettext and encode it yourself, or use "t.set_output_charset('utf-8')" to have it done automatically.

>> print("_charset = {0}\n".format(t._charset))
>> print(_("hello"))

But since you are printing to screen, I suggest using .gettext and let print do the encoding to the screen encoding. If that still raises an encoding error, then the problem is the console emulator. On windows, for instance, IDLE windows handle the entire BMP charset while the stupid Windows Command Prompt window does not (certainly not by default, and not yet, as far I know).

The encoding of the translations file on disk determines how the bytes of the translation table should be *decoded when read, to create unicode strings. It does not determine how those strings should be *encoded* when sent to a particular destination. That may depend on the destination. Multilingual international sites used to encode pages in different limited national encodings, according to the language and destination. Now many encode and send *everything* as utf-8. I think this is the proper policy now. .lgettext seems oriented to the older, pre utf-8, national locale encoding way of doing things.

--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to