#9212: German Umlauts and possible other foreign languages special characters
-------------------------------------------+--------------------------------
          Reporter:  nekron                |         Owner:  nobody  
            Status:  new                   |     Milestone:  post-1.0
         Component:  Internationalization  |       Version:  SVN     
        Resolution:                        |      Keywords:  Umlauts 
             Stage:  Accepted              |     Has_patch:  0       
        Needs_docs:  0                     |   Needs_tests:  0       
Needs_better_patch:  0                     |  
-------------------------------------------+--------------------------------
Comment (by Karen Tracey <[EMAIL PROTECTED]>):

 Yes, it's ugly and backwards-looking but I don't think it's fragile
 actually.  The old xgettext doesn't reject any bytes as invalid, it simply
 takes any byte that has the high bit set and distributes its original 8
 bits between two new bytes, with the high bits of the new bytes set as
 appropriate for a 2-byte utf-8 sequence.  This transformation is
 completely reversible and by doing a .decode('utf-8').encode('iso-8859-1')
 we get back the originally-provided bytes...which may well be the utf-8
 encoding of something not representable in iso-8859-1.

 I'll attach a patch for review (since I'm not entirely comfortable with
 this code so would prefer someone review it before committing) that
 attempts to retrieve/interpret the xgettext version and do the decode if
 necessary.  I did verify via the online cvs history for the GNU gettext
 package that 0.15 is the first version of xgettext that does not assume
 Python source is encoded in iso-8859-1, so checking for lower than 0.15 is
 the right check.  I tested and verified that the patch works with:

 1 - the Windows xgettext binary pointed at by our docs (0.13.1).  Here we
 do the decode to restore the original bytes.

 2 - the current Windows xgettext binary from cygwin (0.15).  Here we do
 not do the decode, which is correct since xgettext doesn't reencode its
 input.

 3 - the xgettext version on my Linux box (0.16.1).  Here again we do not
 do the extra decode since it's not necessary.

 However I ran into a hiccup with the the Windows 0.14.4 binary I found
 online (mentioned earlier).  This version does mangle its output.  However
 it traps when you try to run 'xgettext --version' (or 'xgettext -V').
 That is you get a Windows popup "xgettext.exe has generated errors and
 will be closed by Windows.  You will need to restart the program.  An
 error log is being created".  You get this if you try the version flag
 from the command line or as part of makemessages.

 Once you hit "OK" the makemessages script continues and produces incorrect
 output because I coded the default in case of trouble determining version
 to be to NOT do the extra decode.  (I'm uncomfortable with the idea of
 reversing that...if we can't reliably determine that the version is one
 that mangles output I'm thinking we should not go ahead and potentially
 mangle perfectly good output.)

 But that leaves this one version I'm aware of where we do the wrong thing.
 The user gets a cryptic indication that there's something wrong in the
 form of that popup...but it's pretty cryptic.  Of course this version
 seems to be rather broken, besides trapping when you try to get it to
 return the version string it reports its name as (null), for example
 invoking it with no arguments produces:

 {{{
 xgettext: no input file given
 Try `(null) --help' for more information.
 }}}

 I did try raising a !CommandError if 'xgettext --version' fails to return
 anything.  However that produces another cryptic error:

 {{{
 Error: xgettext --version returns no version information
 d:\u\kmt\django\trunk\django\core\management\base.py:234: RuntimeWarning:
 tp_compare didn't return -1 or -2 for exception
   sys.exit(1)
 }}}

 that I really don't feel like tracking down.  So I'm inclined to not worry
 about working correctly with this version, because it's fundamentally
 broken.  Other opinions?

-- 
Ticket URL: <http://code.djangoproject.com/ticket/9212#comment:8>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to