Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)
Hello Martin, On Sat, Aug 06, 2022 at 05:24:48PM +0200, Martin Quinson wrote: > Hello, thanks for the analysis and the hints for sgt-puzzles. > Le samedi 06 août 2022 à 06:20 +0200, Helge Kreutzmann a écrit : > > > > On Sat, Aug 06, 2022 at 12:23:48AM +0200, Martin Quinson wrote: > > > the short answer is that po4a-gettextize is not intended to be used on a > > > regular > > > basis. It's only intended for the first run when you want to convert an > > > existing > > > translation to the po-based workflow. Once it's done, you're supposed to > > > use > > > po4a-updatepo to create an empty PO file. Even better, you should use po4a > > > directly instead of the deprecated atomic commands. > > > > Ok, so this would be incorrect usage in sgt-puzzles? It did work for > > the past ~ 13 years. Then it might be helpful to add a note that > > certain use cases are not working anymore. > > > > Should this bug be cloned to sgt-puzzles for updating its > > infrastructure? > > The fact is that I never imagined that someone would use po4a-gettextize on a > regular basis, to create the empty POT file. I would have added a note in the > changelog if I knew. I see the intend now, but this is not a usecase that I > plan > to maintain. I just added a check to po4a-gettextize which makes it break if > you > call it without localized files, as you would do to generate the empty POT > files. > > My rational here is that the gettextization (ie, the resynchronization of a > master file and a localized file to generate a valid PO file that can be used > afterward in a po4a-based workflow) is a really tedious process. I prefer the > po4a-gettextize script to dedicate to this usage, trying to smooth it as much > as > possible. This is not really compatible with its use as in sgt-puzzles. > > So, yes, the infrastructure of sgt-puzzles should be updated as it will fail > with the next upcoming release of po4a. Sorry about that. The easiest should > be > to simply use po4a-updatepo with an unexistant POT file instead of po4a- > gettextize, but the best would be to write a simple po4a.conf file and switch > to > the integrated po4a program. Invoking `make -f Makefile.doc update-po` now > fails > with the following error message: > > | You must provide both master files and localized files to po4a-gettextize, > as > | it is intended to synchronize master files and previously existing > translations. > | If just want to extract POT files of your master files, please > po4a-updatepo. > | But the most convenient way of using po4a is to write a po4a.conf file and > use > | the integrated po4a(1) program. > > Changing po4a-gettextize to po4a-updatepo seems to fix everything. > > > > The extra spaces that you see are intended to help the gettextization > > > process, as explained in the po4a-gettextize manpage. > > > > At least I don't fully understand this text, even though I translated > > it. (See below) > > > > > I'm not sure of how I can help you here. What piece of documentation > > > should > > > be updated? > > > > > > What is missing here is how and when these strings are merged back, i.e. > > what > > the translator or package maintainer should do to get to the desired > > situation (i.e. each string only appearing once). > > I tried to rewrite the documentation and apply your comments. Please check the > new version and report any remaining issue. > https://github.com/mquinson/po4a/blob/master/po4a-gettextize#L24 Thanks, this explains it better. > Thanks for your precious feedback, And thanks for your speedy and efficient handling. Greetings Helge -- Dr. Helge Kreutzmann deb...@helgefjell.de Dipl.-Phys. http://www.helgefjell.de/debian.php 64bit GNU powered gpg signed mail preferred Help keep free software "libre": http://www.ffii.de/ signature.asc Description: PGP signature
Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)
Hello, Le samedi 06 août 2022 à 06:20 +0200, Helge Kreutzmann a écrit : > > On Sat, Aug 06, 2022 at 12:23:48AM +0200, Martin Quinson wrote: > > the short answer is that po4a-gettextize is not intended to be used on a > > regular > > basis. It's only intended for the first run when you want to convert an > > existing > > translation to the po-based workflow. Once it's done, you're supposed to use > > po4a-updatepo to create an empty PO file. Even better, you should use po4a > > directly instead of the deprecated atomic commands. > > Ok, so this would be incorrect usage in sgt-puzzles? It did work for > the past ~ 13 years. Then it might be helpful to add a note that > certain use cases are not working anymore. > > Should this bug be cloned to sgt-puzzles for updating its > infrastructure? The fact is that I never imagined that someone would use po4a-gettextize on a regular basis, to create the empty POT file. I would have added a note in the changelog if I knew. I see the intend now, but this is not a usecase that I plan to maintain. I just added a check to po4a-gettextize which makes it break if you call it without localized files, as you would do to generate the empty POT files. My rational here is that the gettextization (ie, the resynchronization of a master file and a localized file to generate a valid PO file that can be used afterward in a po4a-based workflow) is a really tedious process. I prefer the po4a-gettextize script to dedicate to this usage, trying to smooth it as much as possible. This is not really compatible with its use as in sgt-puzzles. So, yes, the infrastructure of sgt-puzzles should be updated as it will fail with the next upcoming release of po4a. Sorry about that. The easiest should be to simply use po4a-updatepo with an unexistant POT file instead of po4a- gettextize, but the best would be to write a simple po4a.conf file and switch to the integrated po4a program. Invoking `make -f Makefile.doc update-po` now fails with the following error message: | You must provide both master files and localized files to po4a-gettextize, as | it is intended to synchronize master files and previously existing translations. | If just want to extract POT files of your master files, please po4a-updatepo. | But the most convenient way of using po4a is to write a po4a.conf file and use | the integrated po4a(1) program. Changing po4a-gettextize to po4a-updatepo seems to fix everything. > > The extra spaces that you see are intended to help the gettextization > > process, as explained in the po4a-gettextize manpage. > > At least I don't fully understand this text, even though I translated > it. (See below) > > > I'm not sure of how I can help you here. What piece of documentation should > > be updated? > > > What is missing here is how and when these strings are merged back, i.e. what > the translator or package maintainer should do to get to the desired > situation (i.e. each string only appearing once). I tried to rewrite the documentation and apply your comments. Please check the new version and report any remaining issue. https://github.com/mquinson/po4a/blob/master/po4a-gettextize#L24 > Could you add an example here? I.e. like I did below with my example? > In your text above (in the e-mail) you state that you should use > po4a-updatepo or po4a, here you mention msgmerge. Probably clarifying > this would help as well. I didn't add any example, but the new text is much more detailed so maybe that's enough? Thanks for your precious feedback, Mt > signature.asc Description: This is a digitally signed message part
Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)
Hello Martin, thanks for your speedy reply. Especially when asking things related to sgt-puzzles, please keep Ben in CC:. On Sat, Aug 06, 2022 at 12:23:48AM +0200, Martin Quinson wrote: > the short answer is that po4a-gettextize is not intended to be used on a > regular > basis. It's only intended for the first run when you want to convert an > existing > translation to the po-based workflow. Once it's done, you're supposed to use > po4a-updatepo to create an empty PO file. Even better, you should use po4a > directly instead of the deprecated atomic commands. Ok, so this would be incorrect usage in sgt-puzzles? It did work for the past ~ 13 years. Then it might be helpful to add a note that certain use cases are not working anymore. Should this bug be cloned to sgt-puzzles for updating its infrastructure? > The extra spaces that you see are intended to help the gettextization process, > as explained in the po4a-gettextize manpage. At least I don't fully understand this text, even though I translated it. (See below) > I'm not sure of how I can help you here. What piece of documentation should be > updated? • In some case, po4a adds a space at the end of either the original or the translated strings. This is because every string must be deduplicated during the gettextize process. Imagine that a string appearing several times unmodified in the original, but is translated in differing way, or that different paragraphs are translated in the exact same way. Without deduplication, such case would break the gettexization algorithm, as it is a simple one to one pairing between the msgids of both the master and the localized files. Since one of the PO files would miss an entry (that would be reported as duplicate, with two references), the pairing would fail. What is missing here is how and when these strings are merged back, i.e. what the translator or package maintainer should do to get to the desired situation (i.e. each string only appearing once). Since po4a uses the entry type ("title" or "plain paragraph", etc) to detect whether the parsing streams got desynchronized, similar issues could occur if two identical entries (same content but differing type) of the master file are translated in the exact same way in the localized file. po4a would detect a fake desyncronization in such case. In most cases, the extra space added by po4a to deduplicate the strings has no impact on the formatting. Strings are fuzzied anyway, and msgmerge will probably match the strings accordingly afterward. Could you add an example here? I.e. like I did below with my example? In your text above (in the e-mail) you state that you should use po4a-updatepo or po4a, here you mention msgmerge. Probably clarifying this would help as well. > Thanks for using po4a, Sure, for translators/translations its a great piece of software. > Mt > > Le vendredi 05 août 2022 à 16:02 +0200, Helge Kreutzmann a écrit : > > Package: po4a > > Version: 0.67-2 > > Severity: normal > > Tags: upstream > > X-Debbugs-Cc: Ben Hutchings > > > > > > I'm the translator of the German translation for the documentation of > > sgt-puzzles. It is a Debian-only patch at the moment for the halibut > > based sources. > > > > A few days ago Ben (the Debian maintainer) updated the package and > > requested me to update the German translation. While doing so he > > noticed a strange change in po4a behaviour: > > > > (Some) strings, which are repeated (because the same text appears in > > multiple places in the documentation resp. many man pages) are > > inserted several times into de.po, except that an increasing number of > > spaces is added, i.e. > > > > "dog" would become > > "dog" > > "dog " > > "dog " > > "dog " > > and so on. > > > > While updating the German translation of po4a I remember translating > > something along these lines, though I did not fully understand its > > meaning. > > > > This behaviour defeats part of the idea of the po format. Unless the > > orginal author indicates this, identical strings in the original text > > should be translated identical as well. > > > > Now for some reason po4a makes identical strings artificially different. > > > > In the toy example above, this could become: > > "Hund" > > "Rüde " > > "Gerüstklammer " > > "Schlepphaken " > > … > > > > So now the same string is translated differently *and* the > > translation receives also (varying) additional trailing spaces. (As a > > translator, you usually reproduce space at the beginning and end). > > > > In this toy example this might be noticed easily, but usually po4a is > > used for (longer) paragraphs - and translators might not realize they > > already translated them and would retranslate them - additional work > > and, as stated above, potentially inconsistent translations. > > > > Thus please revert to the previous behaviour of po4a *or* ensure
Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)
Hello, the short answer is that po4a-gettextize is not intended to be used on a regular basis. It's only intended for the first run when you want to convert an existing translation to the po-based workflow. Once it's done, you're supposed to use po4a-updatepo to create an empty PO file. Even better, you should use po4a directly instead of the deprecated atomic commands. The extra spaces that you see are intended to help the gettextization process, as explained in the po4a-gettextize manpage. I'm not sure of how I can help you here. What piece of documentation should be updated? Thanks for using po4a, Mt Le vendredi 05 août 2022 à 16:02 +0200, Helge Kreutzmann a écrit : > Package: po4a > Version: 0.67-2 > Severity: normal > Tags: upstream > X-Debbugs-Cc: Ben Hutchings > > > I'm the translator of the German translation for the documentation of > sgt-puzzles. It is a Debian-only patch at the moment for the halibut > based sources. > > A few days ago Ben (the Debian maintainer) updated the package and > requested me to update the German translation. While doing so he > noticed a strange change in po4a behaviour: > > (Some) strings, which are repeated (because the same text appears in > multiple places in the documentation resp. many man pages) are > inserted several times into de.po, except that an increasing number of > spaces is added, i.e. > > "dog" would become > "dog" > "dog " > "dog " > "dog " > and so on. > > While updating the German translation of po4a I remember translating > something along these lines, though I did not fully understand its > meaning. > > This behaviour defeats part of the idea of the po format. Unless the > orginal author indicates this, identical strings in the original text > should be translated identical as well. > > Now for some reason po4a makes identical strings artificially different. > > In the toy example above, this could become: > "Hund" > "Rüde " > "Gerüstklammer " > "Schlepphaken " > … > > So now the same string is translated differently *and* the > translation receives also (varying) additional trailing spaces. (As a > translator, you usually reproduce space at the beginning and end). > > In this toy example this might be noticed easily, but usually po4a is > used for (longer) paragraphs - and translators might not realize they > already translated them and would retranslate them - additional work > and, as stated above, potentially inconsistent translations. > > Thus please revert to the previous behaviour of po4a *or* ensure that > identical text is shown only once in the *.po(t) files. > > In case you want to investigate yourself, do the following in > unstable: > > apt-get source sgt-puzzles > cd sgt-puzzles-20191231.79a5378/ > make -f debian/rules build > make -f Makefile.doc update-po > > > -- System Information: > Debian Release: bookworm/sid > APT prefers testing > APT policy: (500, 'testing') > Architecture: amd64 (x86_64) > > Kernel: Linux 5.18.15 (SMP w/12 CPU threads) > Kernel taint flags: TAINT_UNSIGNED_MODULE > Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) (ignored: > LC_ALL set to de_DE.UTF-8), LANGUAGE not set > Shell: /bin/sh linked to /usr/bin/dash > Init: systemd (via /run/systemd/system) > > Versions of packages po4a depends on: > ii gettext 0.21-6 > ii libpod-parser-perl 1.65-1 > ii libsgmls-perl 1.03ii-37 > ii libsyntax-keyword-try-perl 0.27-1 > ii libyaml-tiny-perl 1.73-1 > ii opensp 1.5.2-13+b2 > ii perl 5.34.0-5 > > Versions of packages po4a recommends: > ii liblocale-gettext-perl 1.07-4+b2 > ii libterm-readkey-perl 2.38-1+b3 > ii libtext-wrapi18n-perl 0.06-9 > ii libunicode-linebreak-perl 0.0.20190101-1+b4 > > po4a suggests no packages. > > -- no debconf information > -- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. signature.asc Description: This is a digitally signed message part
Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)
Package: po4a Version: 0.67-2 Severity: normal Tags: upstream X-Debbugs-Cc: Ben Hutchings I'm the translator of the German translation for the documentation of sgt-puzzles. It is a Debian-only patch at the moment for the halibut based sources. A few days ago Ben (the Debian maintainer) updated the package and requested me to update the German translation. While doing so he noticed a strange change in po4a behaviour: (Some) strings, which are repeated (because the same text appears in multiple places in the documentation resp. many man pages) are inserted several times into de.po, except that an increasing number of spaces is added, i.e. "dog" would become "dog" "dog " "dog " "dog " and so on. While updating the German translation of po4a I remember translating something along these lines, though I did not fully understand its meaning. This behaviour defeats part of the idea of the po format. Unless the orginal author indicates this, identical strings in the original text should be translated identical as well. Now for some reason po4a makes identical strings artificially different. In the toy example above, this could become: "Hund" "Rüde " "Gerüstklammer " "Schlepphaken " … So now the same string is translated differently *and* the translation receives also (varying) additional trailing spaces. (As a translator, you usually reproduce space at the beginning and end). In this toy example this might be noticed easily, but usually po4a is used for (longer) paragraphs - and translators might not realize they already translated them and would retranslate them - additional work and, as stated above, potentially inconsistent translations. Thus please revert to the previous behaviour of po4a *or* ensure that identical text is shown only once in the *.po(t) files. In case you want to investigate yourself, do the following in unstable: apt-get source sgt-puzzles cd sgt-puzzles-20191231.79a5378/ make -f debian/rules build make -f Makefile.doc update-po -- System Information: Debian Release: bookworm/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 5.18.15 (SMP w/12 CPU threads) Kernel taint flags: TAINT_UNSIGNED_MODULE Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to de_DE.UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) Versions of packages po4a depends on: ii gettext 0.21-6 ii libpod-parser-perl 1.65-1 ii libsgmls-perl 1.03ii-37 ii libsyntax-keyword-try-perl 0.27-1 ii libyaml-tiny-perl 1.73-1 ii opensp 1.5.2-13+b2 ii perl5.34.0-5 Versions of packages po4a recommends: ii liblocale-gettext-perl 1.07-4+b2 ii libterm-readkey-perl 2.38-1+b3 ii libtext-wrapi18n-perl 0.06-9 ii libunicode-linebreak-perl 0.0.20190101-1+b4 po4a suggests no packages. -- no debconf information -- Dr. Helge Kreutzmann deb...@helgefjell.de Dipl.-Phys. http://www.helgefjell.de/debian.php 64bit GNU powered gpg signed mail preferred Help keep free software "libre": http://www.ffii.de/ signature.asc Description: PGP signature