#26645: Errors when running i18n makemessages tests on Windows --------------------------------------+------------------------------------ Reporter: ramiro | Owner: ramiro Type: Bug | Status: assigned Component: Internationalization | Version: master Severity: Normal | Resolution: Keywords: windows makemessages | Triage Stage: Accepted Has patch: 0 | Needs documentation: 0 Needs tests: 0 | Patch needs improvement: 0 Easy pickings: 0 | UI/UX: 0 --------------------------------------+------------------------------------
Comment (by ramiro): AFAICS what happens on Windows is that by not passing `universal_newlines` to `subprocess.Popen()` as we are doing since fa08d27fb714534670b431fde0cd04a17d637585 the in -memory representation of text content we capture from the standard output of `xgettext(1)`, `msgmerge(1)`, etc. on this platform contains native line ending sequences and not simply `\n`. (All examples below are running the `i18n.test_extraction.BasicExtractorTests.test_blocktrans_trimmed` test case) Blob of text that reaches `django.core.management.commands.makemessages.write_pot_file()`: {{{ (Pdb) msgs u'# SOME DESCRIPTIVE TITLE.\r\n# Copyright (C) YEAR THE PACKAGE\'S COPYRIGHT HOLDER\r\n# This file is distributed under the same license as the PACKAGE package.\r\n# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.\r\n#\r\n#, fuzzy\r\nmsgid ""\r\nmsgstr ""\r\n }}} Note the native `\r\n` line ending sequences. Then, the temporary `.pot` file (`test\i18n\test_extraction\commands\locale\django.pot`) is written. As we open it in text mode (we can't open it in either binary or universal newlines modes because we are passing the ''encoding'' parameter to ``io.open()``) the `\n` sequences are replaced with `\r\n` ones resulting in lines separated by `\r\r\n`, e.g. (output of `hd(1)` on Linux, `django.pot` transferred from the Windows system): {{{ $ hd -n 240 django.pot 00000000 23 20 53 4f 4d 45 20 44 45 53 43 52 49 50 54 49 |# SOME DESCRIPTI| 00000010 56 45 20 54 49 54 4c 45 2e 0d 0d 0a 23 20 43 6f |VE TITLE....# Co| 00000020 70 79 72 69 67 68 74 20 28 43 29 20 59 45 41 52 |pyright (C) YEAR| 00000030 20 54 48 45 20 50 41 43 4b 41 47 45 27 53 20 43 | THE PACKAGE'S C| 00000040 4f 50 59 52 49 47 48 54 20 48 4f 4c 44 45 52 0d |OPYRIGHT HOLDER.| 00000050 0d 0a 23 20 54 68 69 73 20 66 69 6c 65 20 69 73 |..# This file is| 00000060 20 64 69 73 74 72 69 62 75 74 65 64 20 75 6e 64 | distributed und| 00000070 65 72 20 74 68 65 20 73 61 6d 65 20 6c 69 63 65 |er the same lice| 00000080 6e 73 65 20 61 73 20 74 68 65 20 50 41 43 4b 41 |nse as the PACKA| 00000090 47 45 20 70 61 63 6b 61 67 65 2e 0d 0d 0a 23 20 |GE package....# | 000000a0 46 49 52 53 54 20 41 55 54 48 4f 52 20 3c 45 4d |FIRST AUTHOR <EM| 000000b0 41 49 4c 40 41 44 44 52 45 53 53 3e 2c 20 59 45 |AIL@ADDRESS>, YE| 000000c0 41 52 2e 0d 0d 0a 23 0d 0d 0a 23 2c 20 66 75 7a |AR....#...#, fuz| 000000d0 7a 79 0d 0d 0a 6d 73 67 69 64 20 22 22 0d 0d 0a |zy...msgid ""...| 000000e0 6d 73 67 73 74 72 20 22 22 0d 0d 0a 22 50 72 6f |msgstr ""..."Pro| }}} From this stage of the message extraction process onwards: 1. With each update of the on-disk ancillary temporary POT file additional `\r` chars are accumulated. In the mentioned test case it ends with `\r\r\r\r\n` line separators. 2. As further `popen_wrapper` calls are performed (i.e. to call `msgmerge(1)` or `msguniq(1)`), problem 1 above gets carried to the final `.po` file(s) and somehow results in multiple `\r\n` inserted: This is the final `test\i18n\test_extraction\commands\locale\de\LC_MESSAGES\django.po`: {{{ $ hd -n 312 django.po 00000000 23 20 53 4f 4d 45 20 44 45 53 43 52 49 50 54 49 |# SOME DESCRIPTI| 00000010 56 45 20 54 49 54 4c 45 2e 0d 0a 22 50 6c 75 72 |VE TITLE..."Plur| 00000020 61 6c 2d 46 6f 72 6d 73 3a 20 6e 70 6c 75 72 61 |al-Forms: nplura| 00000030 6c 73 3d 32 3b 20 70 6c 75 72 61 6c 3d 28 6e 20 |ls=2; plural=(n | 00000040 21 3d 20 31 29 3b 5c 6e 22 0d 0a 0d 0a 0d 0a 0d |!= 1);\n".......| 00000050 0a 23 20 43 6f 70 79 72 69 67 68 74 20 28 43 29 |.# Copyright (C)| 00000060 20 59 45 41 52 20 54 48 45 20 50 41 43 4b 41 47 | YEAR THE PACKAG| 00000070 45 27 53 20 43 4f 50 59 52 49 47 48 54 20 48 4f |E'S COPYRIGHT HO| 00000080 4c 44 45 52 0d 0a 0d 0a 0d 0a 0d 0a 23 20 54 68 |LDER........# Th| 00000090 69 73 20 66 69 6c 65 20 69 73 20 64 69 73 74 72 |is file is distr| 000000a0 69 62 75 74 65 64 20 75 6e 64 65 72 20 74 68 65 |ibuted under the| 000000b0 20 73 61 6d 65 20 6c 69 63 65 6e 73 65 20 61 73 | same license as| 000000c0 20 74 68 65 20 50 41 43 4b 41 47 45 20 70 61 63 | the PACKAGE pac| 000000d0 6b 61 67 65 2e 0d 0a 0d 0a 0d 0a 0d 0a 23 20 46 |kage.........# F| 000000e0 49 52 53 54 20 41 55 54 48 4f 52 20 3c 45 4d 41 |IRST AUTHOR <EMA| 000000f0 49 4c 40 41 44 44 52 45 53 53 3e 2c 20 59 45 41 |IL@ADDRESS>, YEA| 00000100 52 2e 0d 0a 0d 0a 0d 0a 0d 0a 23 20 0d 0a 0d 0a |R.........# ....| 00000110 0d 0a 0d 0a 23 2c 20 66 75 7a 7a 79 0d 0a 0d 0a |....#, fuzzy....| 00000120 6d 73 67 69 64 20 22 22 0d 0a 0d 0a 6d 73 67 73 |msgid ""....msgs| 00000130 74 72 20 22 22 0d 0a 0d |tr ""...| }}} 3. The newly added newlines confuse `copy_plural_forms()` (see above) and the code in charge setting of the `charset` value on the PO header to `utf-8`. '''This is what causes the symptom reported by this ticket''': `django.po:2:47: syntax error -- msgmerge: found 1 fatal error` #25667 was about the inability to handle error output of `msgfmt(1)` which is actually invoked from `compilemessages` buf the removal of the `gettext_popen_wrapper` function (which was the one passing `universal_newlines` to `subprocess.Popen()`) introduced this unintended effect. One possible path would be to re-introduce `universal_newlines` only for the handling of the gettext tools used by `makemessages` which, unlike the `compilemenssages`/`msgmft(1)`, do actually spew POT and PO content in- band; but I'm afraid to do that given the difficulties we've experienced on this front, see #20271, 6fb9dee470d57882e378247fd2706d5f9867b5f9 and 57202a112a966593857725071ecd652a87c157fb for some examples. I'm working on an alternate approach. Will update this ticket with the link to a PR. -- Ticket URL: <https://code.djangoproject.com/ticket/26645#comment:3> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/064.5e66df77b375a111735d815b37c2c809%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.