#9212: German Umlauts and possible other foreign languages special characters
-------------------------------------------+--------------------------------
Reporter: nekron | Owner: nobody
Status: new | Milestone: post-1.0
Component: Internationalization | Version: SVN
Resolution: | Keywords: Umlauts
Stage: Accepted | Has_patch: 0
Needs_docs: 0 | Needs_tests: 0
Needs_better_patch: 0 |
-------------------------------------------+--------------------------------
Comment (by Karen Tracey <[EMAIL PROTECTED]>):
Yes, it's ugly and backwards-looking but I don't think it's fragile
actually. The old xgettext doesn't reject any bytes as invalid, it simply
takes any byte that has the high bit set and distributes its original 8
bits between two new bytes, with the high bits of the new bytes set as
appropriate for a 2-byte utf-8 sequence. This transformation is
completely reversible and by doing a .decode('utf-8').encode('iso-8859-1')
we get back the originally-provided bytes...which may well be the utf-8
encoding of something not representable in iso-8859-1.
I'll attach a patch for review (since I'm not entirely comfortable with
this code so would prefer someone review it before committing) that
attempts to retrieve/interpret the xgettext version and do the decode if
necessary. I did verify via the online cvs history for the GNU gettext
package that 0.15 is the first version of xgettext that does not assume
Python source is encoded in iso-8859-1, so checking for lower than 0.15 is
the right check. I tested and verified that the patch works with:
1 - the Windows xgettext binary pointed at by our docs (0.13.1). Here we
do the decode to restore the original bytes.
2 - the current Windows xgettext binary from cygwin (0.15). Here we do
not do the decode, which is correct since xgettext doesn't reencode its
input.
3 - the xgettext version on my Linux box (0.16.1). Here again we do not
do the extra decode since it's not necessary.
However I ran into a hiccup with the the Windows 0.14.4 binary I found
online (mentioned earlier). This version does mangle its output. However
it traps when you try to run 'xgettext --version' (or 'xgettext -V').
That is you get a Windows popup "xgettext.exe has generated errors and
will be closed by Windows. You will need to restart the program. An
error log is being created". You get this if you try the version flag
from the command line or as part of makemessages.
Once you hit "OK" the makemessages script continues and produces incorrect
output because I coded the default in case of trouble determining version
to be to NOT do the extra decode. (I'm uncomfortable with the idea of
reversing that...if we can't reliably determine that the version is one
that mangles output I'm thinking we should not go ahead and potentially
mangle perfectly good output.)
But that leaves this one version I'm aware of where we do the wrong thing.
The user gets a cryptic indication that there's something wrong in the
form of that popup...but it's pretty cryptic. Of course this version
seems to be rather broken, besides trapping when you try to get it to
return the version string it reports its name as (null), for example
invoking it with no arguments produces:
{{{
xgettext: no input file given
Try `(null) --help' for more information.
}}}
I did try raising a !CommandError if 'xgettext --version' fails to return
anything. However that produces another cryptic error:
{{{
Error: xgettext --version returns no version information
d:\u\kmt\django\trunk\django\core\management\base.py:234: RuntimeWarning:
tp_compare didn't return -1 or -2 for exception
sys.exit(1)
}}}
that I really don't feel like tracking down. So I'm inclined to not worry
about working correctly with this version, because it's fundamentally
broken. Other opinions?
--
Ticket URL: <http://code.djangoproject.com/ticket/9212#comment:8>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---