On Dec 15, 9:12 pm, JKPeck <jkp...@gmail.com> wrote: > I'm using Python 2.6 on Windows and having trouble with the charset in > gettext. It seems to be so broken that I must be missing something. > > When I run msgfmt.py, as far as I can see it writes no charset > information into the mo file. The actual po files are in utf-8 in > this case and have a charset declaration. > > Then when ,_parse in gettext loads the messages, it does no conversion > to Unicode, because it has no charset information. So the message > dictionary is actually in utf-8 despite the comment in the code > # Note: we unconditionally convert both msgids and msgstrs to > # Unicode using the character encoding specified in the > charset > # parameter of the Content-Type header. > > Then ugettext tries to just return the translated message, which is > not in Unicode, or to convert to Unicode, which fails because the > unicode call is not specifying any encoding. > > The _parse code seems to expect to produce a Unicode translation > dictionary, and gettext expects to encode Unicode into the current > code page, but the message dictionary never gets mapped to Unicode in > the first place. > > What I want is simply to use utf-8 po files and get translations in > Unicode. > > TIA for any suggestions. > > -Jon Peck
Never mind. I figured this out. The problem is that a line such as _("") in the source that is scanned causes all the meta information to be lost in the mo file. Once I changed that code, I get the expected result. -- http://mail.python.org/mailman/listinfo/python-list