#26645: Errors when running i18n makemessages tests on Windows
--------------------------------------+------------------------------------
     Reporter:  ramiro                |                    Owner:  ramiro
         Type:  Bug                   |                   Status:  assigned
    Component:  Internationalization  |                  Version:  master
     Severity:  Normal                |               Resolution:
     Keywords:  windows makemessages  |             Triage Stage:  Accepted
    Has patch:  0                     |      Needs documentation:  0
  Needs tests:  0                     |  Patch needs improvement:  0
Easy pickings:  0                     |                    UI/UX:  0
--------------------------------------+------------------------------------

Comment (by ramiro):

 AFAICS what happens on Windows is that by not passing `universal_newlines`
 to `subprocess.Popen()` as we are doing since
 fa08d27fb714534670b431fde0cd04a17d637585 the in -memory representation of
 text content we capture from the standard output of `xgettext(1)`,
 `msgmerge(1)`, etc. on this platform contains native line ending sequences
 and not simply `\n`.

 (All examples below are running the
 `i18n.test_extraction.BasicExtractorTests.test_blocktrans_trimmed` test
 case)

 Blob of text that reaches
 `django.core.management.commands.makemessages.write_pot_file()`:
 {{{
 (Pdb) msgs
 u'# SOME DESCRIPTIVE TITLE.\r\n# Copyright (C) YEAR THE PACKAGE\'S
 COPYRIGHT HOLDER\r\n# This file is distributed under the same license as
 the PACKAGE package.\r\n# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.\r\n#\r\n#,
 fuzzy\r\nmsgid ""\r\nmsgstr ""\r\n
 }}}

 Note the native `\r\n` line ending sequences.

 Then, the temporary `.pot` file
 (`test\i18n\test_extraction\commands\locale\django.pot`) is written. As we
 open it in text mode (we can't open it in either binary or universal
 newlines modes because we are passing the ''encoding'' parameter to
 ``io.open()``) the `\n` sequences are replaced with `\r\n` ones resulting
 in lines separated by `\r\r\n`, e.g. (output of `hd(1)` on Linux,
 `django.pot` transferred from the Windows system):

 {{{
 $ hd -n 240 django.pot
 00000000  23 20 53 4f 4d 45 20 44  45 53 43 52 49 50 54 49  |# SOME
 DESCRIPTI|
 00000010  56 45 20 54 49 54 4c 45  2e 0d 0d 0a 23 20 43 6f  |VE TITLE....#
 Co|
 00000020  70 79 72 69 67 68 74 20  28 43 29 20 59 45 41 52  |pyright (C)
 YEAR|
 00000030  20 54 48 45 20 50 41 43  4b 41 47 45 27 53 20 43  | THE
 PACKAGE'S C|
 00000040  4f 50 59 52 49 47 48 54  20 48 4f 4c 44 45 52 0d  |OPYRIGHT
 HOLDER.|
 00000050  0d 0a 23 20 54 68 69 73  20 66 69 6c 65 20 69 73  |..# This file
 is|
 00000060  20 64 69 73 74 72 69 62  75 74 65 64 20 75 6e 64  | distributed
 und|
 00000070  65 72 20 74 68 65 20 73  61 6d 65 20 6c 69 63 65  |er the same
 lice|
 00000080  6e 73 65 20 61 73 20 74  68 65 20 50 41 43 4b 41  |nse as the
 PACKA|
 00000090  47 45 20 70 61 63 6b 61  67 65 2e 0d 0d 0a 23 20  |GE
 package....# |
 000000a0  46 49 52 53 54 20 41 55  54 48 4f 52 20 3c 45 4d  |FIRST AUTHOR
 <EM|
 000000b0  41 49 4c 40 41 44 44 52  45 53 53 3e 2c 20 59 45  |AIL@ADDRESS>,
 YE|
 000000c0  41 52 2e 0d 0d 0a 23 0d  0d 0a 23 2c 20 66 75 7a  |AR....#...#,
 fuz|
 000000d0  7a 79 0d 0d 0a 6d 73 67  69 64 20 22 22 0d 0d 0a  |zy...msgid
 ""...|
 000000e0  6d 73 67 73 74 72 20 22  22 0d 0d 0a 22 50 72 6f  |msgstr
 ""..."Pro|
 }}}

 From this stage of the message extraction process onwards:
 1. With each update of the on-disk ancillary temporary POT file additional
 `\r` chars are accumulated. In the mentioned test case it ends with
 `\r\r\r\r\n` line separators.
 2. As further `popen_wrapper` calls  are performed (i.e. to call
 `msgmerge(1)` or `msguniq(1)`), problem 1 above gets carried to the final
 `.po` file(s) and somehow results in multiple `\r\n` inserted:

 This is the final
 `test\i18n\test_extraction\commands\locale\de\LC_MESSAGES\django.po`:
 {{{
 $ hd -n 312 django.po
 00000000  23 20 53 4f 4d 45 20 44  45 53 43 52 49 50 54 49  |# SOME
 DESCRIPTI|
 00000010  56 45 20 54 49 54 4c 45  2e 0d 0a 22 50 6c 75 72  |VE
 TITLE..."Plur|
 00000020  61 6c 2d 46 6f 72 6d 73  3a 20 6e 70 6c 75 72 61  |al-Forms:
 nplura|
 00000030  6c 73 3d 32 3b 20 70 6c  75 72 61 6c 3d 28 6e 20  |ls=2;
 plural=(n |
 00000040  21 3d 20 31 29 3b 5c 6e  22 0d 0a 0d 0a 0d 0a 0d  |!=
 1);\n".......|
 00000050  0a 23 20 43 6f 70 79 72  69 67 68 74 20 28 43 29  |.# Copyright
 (C)|
 00000060  20 59 45 41 52 20 54 48  45 20 50 41 43 4b 41 47  | YEAR THE
 PACKAG|
 00000070  45 27 53 20 43 4f 50 59  52 49 47 48 54 20 48 4f  |E'S COPYRIGHT
 HO|
 00000080  4c 44 45 52 0d 0a 0d 0a  0d 0a 0d 0a 23 20 54 68  |LDER........#
 Th|
 00000090  69 73 20 66 69 6c 65 20  69 73 20 64 69 73 74 72  |is file is
 distr|
 000000a0  69 62 75 74 65 64 20 75  6e 64 65 72 20 74 68 65  |ibuted under
 the|
 000000b0  20 73 61 6d 65 20 6c 69  63 65 6e 73 65 20 61 73  | same license
 as|
 000000c0  20 74 68 65 20 50 41 43  4b 41 47 45 20 70 61 63  | the PACKAGE
 pac|
 000000d0  6b 61 67 65 2e 0d 0a 0d  0a 0d 0a 0d 0a 23 20 46
 |kage.........# F|
 000000e0  49 52 53 54 20 41 55 54  48 4f 52 20 3c 45 4d 41  |IRST AUTHOR
 <EMA|
 000000f0  49 4c 40 41 44 44 52 45  53 53 3e 2c 20 59 45 41  |IL@ADDRESS>,
 YEA|
 00000100  52 2e 0d 0a 0d 0a 0d 0a  0d 0a 23 20 0d 0a 0d 0a  |R.........#
 ....|
 00000110  0d 0a 0d 0a 23 2c 20 66  75 7a 7a 79 0d 0a 0d 0a  |....#,
 fuzzy....|
 00000120  6d 73 67 69 64 20 22 22  0d 0a 0d 0a 6d 73 67 73  |msgid
 ""....msgs|
 00000130  74 72 20 22 22 0d 0a 0d                           |tr ""...|
 }}}

 3. The newly added newlines confuse `copy_plural_forms()` (see above) and
 the code in charge setting of the `charset` value on the PO header to
 `utf-8`. '''This is what causes the symptom reported by this ticket''':
 `django.po:2:47: syntax error -- msgmerge: found 1 fatal error`

 #25667 was about the inability to handle error output of `msgfmt(1)` which
 is actually invoked from `compilemessages` buf the removal of the
 `gettext_popen_wrapper` function (which was the one passing
 `universal_newlines` to `subprocess.Popen()`) introduced this unintended
 effect.

 One possible path would be to re-introduce `universal_newlines` only for
 the handling of the gettext tools used by `makemessages` which, unlike the
 `compilemenssages`/`msgmft(1)`, do actually spew POT and PO content in-
 band; but I'm afraid to do that given the difficulties we've experienced
 on this front, see #20271, 6fb9dee470d57882e378247fd2706d5f9867b5f9 and
 57202a112a966593857725071ecd652a87c157fb for some examples.

 I'm working on an alternate approach. Will update this ticket with the
 link to a PR.

--
Ticket URL: <https://code.djangoproject.com/ticket/26645#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/064.5e66df77b375a111735d815b37c2c809%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to