Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)

2022-08-06 Thread Helge Kreutzmann
Hello Martin,
On Sat, Aug 06, 2022 at 05:24:48PM +0200, Martin Quinson wrote:
> Hello,

thanks for the analysis and the hints for sgt-puzzles.

> Le samedi 06 août 2022 à 06:20 +0200, Helge Kreutzmann a écrit :
> > 
> > On Sat, Aug 06, 2022 at 12:23:48AM +0200, Martin Quinson wrote:
> > > the short answer is that po4a-gettextize is not intended to be used on a
> > > regular
> > > basis. It's only intended for the first run when you want to convert an
> > > existing
> > > translation to the po-based workflow. Once it's done, you're supposed to 
> > > use
> > > po4a-updatepo to create an empty PO file. Even better, you should use po4a
> > > directly instead of the deprecated atomic commands.
> > 
> > Ok, so this would be incorrect usage in sgt-puzzles? It did work for
> > the past ~ 13 years. Then it might be helpful to add a note that
> > certain use cases are not working anymore.
> > 
> > Should this bug be cloned to sgt-puzzles for updating its
> > infrastructure?
> 
> The fact is that I never imagined that someone would use po4a-gettextize on a
> regular basis, to create the empty POT file. I would have added a note in the
> changelog if I knew. I see the intend now, but this is not a usecase that I 
> plan
> to maintain. I just added a check to po4a-gettextize which makes it break if 
> you
> call it without localized files, as you would do to generate the empty POT
> files.
> 
> My rational here is that the gettextization (ie, the resynchronization of a
> master file and a localized file to generate a valid PO file that can be used
> afterward in a po4a-based workflow) is a really tedious process. I prefer the
> po4a-gettextize script to dedicate to this usage, trying to smooth it as much 
> as
> possible. This is not really compatible with its use as in sgt-puzzles.
> 
> So, yes, the infrastructure of sgt-puzzles should be updated as it will fail
> with the next upcoming release of po4a. Sorry about that. The easiest should 
> be
> to simply use po4a-updatepo with an unexistant POT file instead of po4a-
> gettextize, but the best would be to write a simple po4a.conf file and switch 
> to
> the integrated po4a program. Invoking `make -f Makefile.doc update-po` now 
> fails
> with the following error message:
> 
> | You must provide both master files and localized files to po4a-gettextize, 
> as 
> | it is intended to synchronize master files and previously existing 
> translations.
> | If just want to extract POT files of your master files, please 
> po4a-updatepo.
> | But the most convenient way of using po4a is to write a po4a.conf file and 
> use
> | the integrated po4a(1) program.
> 
> Changing po4a-gettextize to po4a-updatepo seems to fix everything.
> 
> > > The extra spaces that you see are intended to help the gettextization
> > > process, as explained in the po4a-gettextize manpage.
> > 
> > At least I don't fully understand this text, even though I translated
> > it. (See below)
> > 
> > > I'm not sure of how I can help you here. What piece of documentation 
> > > should
> > > be updated?
> > 
> > 
> > What is missing here is how and when these strings are merged back, i.e. 
> > what
> > the translator or package maintainer should do to get to the desired
> > situation (i.e. each string only appearing once). 
> 
> I tried to rewrite the documentation and apply your comments. Please check the
> new version and report any remaining issue.
> https://github.com/mquinson/po4a/blob/master/po4a-gettextize#L24

Thanks, this explains it better.

> Thanks for your precious feedback,

And thanks for your speedy and efficient handling.

Greetings

   Helge


-- 
  Dr. Helge Kreutzmann deb...@helgefjell.de
   Dipl.-Phys.   http://www.helgefjell.de/debian.php
64bit GNU powered gpg signed mail preferred
   Help keep free software "libre": http://www.ffii.de/


signature.asc
Description: PGP signature


Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)

2022-08-06 Thread Martin Quinson
Hello,

Le samedi 06 août 2022 à 06:20 +0200, Helge Kreutzmann a écrit :
> 
> On Sat, Aug 06, 2022 at 12:23:48AM +0200, Martin Quinson wrote:
> > the short answer is that po4a-gettextize is not intended to be used on a
> > regular
> > basis. It's only intended for the first run when you want to convert an
> > existing
> > translation to the po-based workflow. Once it's done, you're supposed to use
> > po4a-updatepo to create an empty PO file. Even better, you should use po4a
> > directly instead of the deprecated atomic commands.
> 
> Ok, so this would be incorrect usage in sgt-puzzles? It did work for
> the past ~ 13 years. Then it might be helpful to add a note that
> certain use cases are not working anymore.
> 
> Should this bug be cloned to sgt-puzzles for updating its
> infrastructure?

The fact is that I never imagined that someone would use po4a-gettextize on a
regular basis, to create the empty POT file. I would have added a note in the
changelog if I knew. I see the intend now, but this is not a usecase that I plan
to maintain. I just added a check to po4a-gettextize which makes it break if you
call it without localized files, as you would do to generate the empty POT
files.

My rational here is that the gettextization (ie, the resynchronization of a
master file and a localized file to generate a valid PO file that can be used
afterward in a po4a-based workflow) is a really tedious process. I prefer the
po4a-gettextize script to dedicate to this usage, trying to smooth it as much as
possible. This is not really compatible with its use as in sgt-puzzles.

So, yes, the infrastructure of sgt-puzzles should be updated as it will fail
with the next upcoming release of po4a. Sorry about that. The easiest should be
to simply use po4a-updatepo with an unexistant POT file instead of po4a-
gettextize, but the best would be to write a simple po4a.conf file and switch to
the integrated po4a program. Invoking `make -f Makefile.doc update-po` now fails
with the following error message:

| You must provide both master files and localized files to po4a-gettextize, as 
| it is intended to synchronize master files and previously existing 
translations.
| If just want to extract POT files of your master files, please po4a-updatepo.
| But the most convenient way of using po4a is to write a po4a.conf file and use
| the integrated po4a(1) program.

Changing po4a-gettextize to po4a-updatepo seems to fix everything.

> > The extra spaces that you see are intended to help the gettextization
> > process, as explained in the po4a-gettextize manpage.
> 
> At least I don't fully understand this text, even though I translated
> it. (See below)
> 
> > I'm not sure of how I can help you here. What piece of documentation should
> > be updated?
> 
> 
> What is missing here is how and when these strings are merged back, i.e. what
> the translator or package maintainer should do to get to the desired
> situation (i.e. each string only appearing once). 

I tried to rewrite the documentation and apply your comments. Please check the
new version and report any remaining issue.
https://github.com/mquinson/po4a/blob/master/po4a-gettextize#L24


> Could you add an example here? I.e. like I did below with my example?
> In your text above (in the e-mail) you state that you should use 
> po4a-updatepo or po4a, here you mention msgmerge. Probably clarifying 
> this would help as well.

I didn't add any example, but the new text is much more detailed so maybe that's
enough?

Thanks for your precious feedback,
Mt
> 



signature.asc
Description: This is a digitally signed message part


Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)

2022-08-05 Thread Helge Kreutzmann
Hello Martin,
thanks for your speedy reply.

Especially when asking things related to sgt-puzzles, please keep Ben
in CC:.

On Sat, Aug 06, 2022 at 12:23:48AM +0200, Martin Quinson wrote:
> the short answer is that po4a-gettextize is not intended to be used on a 
> regular
> basis. It's only intended for the first run when you want to convert an 
> existing
> translation to the po-based workflow. Once it's done, you're supposed to use
> po4a-updatepo to create an empty PO file. Even better, you should use po4a
> directly instead of the deprecated atomic commands.

Ok, so this would be incorrect usage in sgt-puzzles? It did work for
the past ~ 13 years. Then it might be helpful to add a note that
certain use cases are not working anymore.

Should this bug be cloned to sgt-puzzles for updating its
infrastructure?

> The extra spaces that you see are intended to help the gettextization process,
> as explained in the po4a-gettextize manpage.

At least I don't fully understand this text, even though I translated
it. (See below)

> I'm not sure of how I can help you here. What piece of documentation should be
> updated?

   •   In some case, po4a adds a space at the end of either the original or 
the translated strings. This is because every string must be deduplicated 
during the gettextize process. Imagine that a string appearing several times
   unmodified in the original, but is translated in differing way, or 
that different paragraphs are translated in the exact same way.

   Without deduplication, such case would break the gettexization 
algorithm, as it is a simple one to one pairing between the msgids of both the 
master and the localized files. Since one of the PO files would miss an entry
   (that would be reported as duplicate, with two references), the 
pairing would fail.

What is missing here is how and when these strings are merged back, i.e. what 
the 
translator or package maintainer should do to get to the desired
situation (i.e. each string only appearing once). 

   Since po4a uses the entry type ("title" or "plain paragraph", etc) 
to detect whether the parsing streams got desynchronized, similar issues could 
occur if two identical entries (same content but differing type) of the
   master file are translated in the exact same way in the localized 
file. po4a would detect a fake desyncronization in such case.

   In most cases, the extra space added by po4a to deduplicate the 
strings has no impact on the formatting. Strings are fuzzied anyway, and 
msgmerge will probably match the strings accordingly afterward.

Could you add an example here? I.e. like I did below with my example?
In your text above (in the e-mail) you state that you should use 
po4a-updatepo or po4a, here you mention msgmerge. Probably clarifying 
this would help as well.

> Thanks for using po4a,

Sure, for translators/translations its a great piece of software. 

> Mt
> 
> Le vendredi 05 août 2022 à 16:02 +0200, Helge Kreutzmann a écrit :
> > Package: po4a
> > Version: 0.67-2
> > Severity: normal
> > Tags: upstream
> > X-Debbugs-Cc: Ben Hutchings 
> > 
> > 
> > I'm the translator of the German translation for the documentation of
> > sgt-puzzles. It is a Debian-only patch at the moment for the halibut
> > based sources.
> > 
> > A few days ago Ben (the Debian maintainer) updated the package and
> > requested me to update the German translation. While doing so he
> > noticed a strange change in po4a behaviour:
> > 
> > (Some) strings, which are repeated (because the same text appears in
> > multiple places in the documentation resp. many man pages) are
> > inserted several times into de.po, except that an increasing number of
> > spaces is added, i.e.
> > 
> > "dog" would become
> > "dog"
> > "dog "
> > "dog  "
> > "dog   "
> > and so on.
> > 
> > While updating the German translation of po4a I remember translating 
> > something along these lines, though I did not fully understand its 
> > meaning.
> > 
> > This behaviour defeats part of the idea of the po format. Unless the
> > orginal author indicates this, identical strings in the original text
> > should be translated identical as well. 
> > 
> > Now for some reason po4a makes identical strings artificially different. 
> > 
> > In the toy example above, this could become:
> > "Hund"
> > "Rüde "
> > "Gerüstklammer  "
> > "Schlepphaken   " 
> > …
> > 
> > So now the same string is translated differently *and* the
> > translation receives also (varying) additional trailing spaces. (As a
> > translator, you usually reproduce space at the beginning and end). 
> > 
> > In this toy example this might be noticed easily, but usually po4a is
> > used for (longer) paragraphs - and translators might not realize they
> > already translated them and would retranslate them - additional work
> > and, as stated above, potentially inconsistent translations.
> > 
> > Thus please revert to the previous behaviour of po4a *or* ensure 

Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)

2022-08-05 Thread Martin Quinson
Hello,

the short answer is that po4a-gettextize is not intended to be used on a regular
basis. It's only intended for the first run when you want to convert an existing
translation to the po-based workflow. Once it's done, you're supposed to use
po4a-updatepo to create an empty PO file. Even better, you should use po4a
directly instead of the deprecated atomic commands.

The extra spaces that you see are intended to help the gettextization process,
as explained in the po4a-gettextize manpage.

I'm not sure of how I can help you here. What piece of documentation should be
updated?

Thanks for using po4a,
Mt

Le vendredi 05 août 2022 à 16:02 +0200, Helge Kreutzmann a écrit :
> Package: po4a
> Version: 0.67-2
> Severity: normal
> Tags: upstream
> X-Debbugs-Cc: Ben Hutchings 
> 
> 
> I'm the translator of the German translation for the documentation of
> sgt-puzzles. It is a Debian-only patch at the moment for the halibut
> based sources.
> 
> A few days ago Ben (the Debian maintainer) updated the package and
> requested me to update the German translation. While doing so he
> noticed a strange change in po4a behaviour:
> 
> (Some) strings, which are repeated (because the same text appears in
> multiple places in the documentation resp. many man pages) are
> inserted several times into de.po, except that an increasing number of
> spaces is added, i.e.
> 
> "dog" would become
> "dog"
> "dog "
> "dog  "
> "dog   "
> and so on.
> 
> While updating the German translation of po4a I remember translating 
> something along these lines, though I did not fully understand its 
> meaning.
> 
> This behaviour defeats part of the idea of the po format. Unless the
> orginal author indicates this, identical strings in the original text
> should be translated identical as well. 
> 
> Now for some reason po4a makes identical strings artificially different. 
> 
> In the toy example above, this could become:
> "Hund"
> "Rüde "
> "Gerüstklammer  "
> "Schlepphaken   " 
> …
> 
> So now the same string is translated differently *and* the
> translation receives also (varying) additional trailing spaces. (As a
> translator, you usually reproduce space at the beginning and end). 
> 
> In this toy example this might be noticed easily, but usually po4a is
> used for (longer) paragraphs - and translators might not realize they
> already translated them and would retranslate them - additional work
> and, as stated above, potentially inconsistent translations.
> 
> Thus please revert to the previous behaviour of po4a *or* ensure that
> identical text is shown only once in the *.po(t) files.
> 
> In case you want to investigate yourself, do the following in
> unstable:
> 
> apt-get source sgt-puzzles
> cd sgt-puzzles-20191231.79a5378/
> make -f debian/rules build
> make -f Makefile.doc update-po
> 
> 
> -- System Information:
> Debian Release: bookworm/sid
>   APT prefers testing
>   APT policy: (500, 'testing')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 5.18.15 (SMP w/12 CPU threads)
> Kernel taint flags: TAINT_UNSIGNED_MODULE
> Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) (ignored:
> LC_ALL set to de_DE.UTF-8), LANGUAGE not set
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> 
> Versions of packages po4a depends on:
> ii  gettext 0.21-6
> ii  libpod-parser-perl  1.65-1
> ii  libsgmls-perl   1.03ii-37
> ii  libsyntax-keyword-try-perl  0.27-1
> ii  libyaml-tiny-perl   1.73-1
> ii  opensp  1.5.2-13+b2
> ii  perl    5.34.0-5
> 
> Versions of packages po4a recommends:
> ii  liblocale-gettext-perl 1.07-4+b2
> ii  libterm-readkey-perl   2.38-1+b3
> ii  libtext-wrapi18n-perl  0.06-9
> ii  libunicode-linebreak-perl  0.0.20190101-1+b4
> 
> po4a suggests no packages.
> 
> -- no debconf information
> 

-- 
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.



signature.asc
Description: This is a digitally signed message part


Bug#1016695: po4a: Strange behaviour with repeated strings (in halibut backend)

2022-08-05 Thread Helge Kreutzmann
Package: po4a
Version: 0.67-2
Severity: normal
Tags: upstream
X-Debbugs-Cc: Ben Hutchings 


I'm the translator of the German translation for the documentation of
sgt-puzzles. It is a Debian-only patch at the moment for the halibut
based sources.

A few days ago Ben (the Debian maintainer) updated the package and
requested me to update the German translation. While doing so he
noticed a strange change in po4a behaviour:

(Some) strings, which are repeated (because the same text appears in
multiple places in the documentation resp. many man pages) are
inserted several times into de.po, except that an increasing number of
spaces is added, i.e.

"dog" would become
"dog"
"dog "
"dog  "
"dog   "
and so on.

While updating the German translation of po4a I remember translating 
something along these lines, though I did not fully understand its 
meaning.

This behaviour defeats part of the idea of the po format. Unless the
orginal author indicates this, identical strings in the original text
should be translated identical as well. 

Now for some reason po4a makes identical strings artificially different. 

In the toy example above, this could become:
"Hund"
"Rüde "
"Gerüstklammer  "
"Schlepphaken   " 
…

So now the same string is translated differently *and* the
translation receives also (varying) additional trailing spaces. (As a
translator, you usually reproduce space at the beginning and end). 

In this toy example this might be noticed easily, but usually po4a is
used for (longer) paragraphs - and translators might not realize they
already translated them and would retranslate them - additional work
and, as stated above, potentially inconsistent translations.

Thus please revert to the previous behaviour of po4a *or* ensure that
identical text is shown only once in the *.po(t) files.

In case you want to investigate yourself, do the following in
unstable:

apt-get source sgt-puzzles
cd sgt-puzzles-20191231.79a5378/
make -f debian/rules build
make -f Makefile.doc update-po


-- System Information:
Debian Release: bookworm/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 5.18.15 (SMP w/12 CPU threads)
Kernel taint flags: TAINT_UNSIGNED_MODULE
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) (ignored: LC_ALL 
set to de_DE.UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages po4a depends on:
ii  gettext 0.21-6
ii  libpod-parser-perl  1.65-1
ii  libsgmls-perl   1.03ii-37
ii  libsyntax-keyword-try-perl  0.27-1
ii  libyaml-tiny-perl   1.73-1
ii  opensp  1.5.2-13+b2
ii  perl5.34.0-5

Versions of packages po4a recommends:
ii  liblocale-gettext-perl 1.07-4+b2
ii  libterm-readkey-perl   2.38-1+b3
ii  libtext-wrapi18n-perl  0.06-9
ii  libunicode-linebreak-perl  0.0.20190101-1+b4

po4a suggests no packages.

-- no debconf information

-- 
  Dr. Helge Kreutzmann deb...@helgefjell.de
   Dipl.-Phys.   http://www.helgefjell.de/debian.php
64bit GNU powered gpg signed mail preferred
   Help keep free software "libre": http://www.ffii.de/


signature.asc
Description: PGP signature