automake po / pot file integration: when to merge the PO files?

Bruno Haible Mon, 06 Sep 2010 02:28:07 -0700

Hi,

One issue still needs discussion within the planned po / pot file
integration [1]:
When should the PO files that are distributed be merged with the POT file?

The problem
-----------

PO files (translations) are produced by translators and integrated to the
project either by a maintainer (who receives them by mail from the translators
directly or through the TP robot) or by a translator herself (who commits
it into the version control repository).

When a new release is made, or shortly before a new release is made, the
maintainer circulates a tarball, and the translators are supposed to pick
the PO files from this tarball and improve them by translating new
untranslated messages.

A PO file for a translator is produced by running 'msgmerge', basically
$ msgmerge last-translation.po new-messages-list.pot > new-translation.po

If the PO files are being put in a VCS, then each time an 'msgmerge' is done,
the PO file changes (new line numbers, new messages, dropped messages, etc.).
Maintainers don't like this because
- If they commit the modified PO files regularly, they bloat the history
of their VCS,
- If they don't commit them regularly, the risk of conflicts increases.
Either way, it causes regular hassles.

If the PO files are not being put in a VCS, then
1. the VCS contents is not the complete source,
2. the workflow where translators commit their translations directly is
impossible.

The classical approach
----------------------

In the approach designed in 1995, there is one PO file per language.

Logically, the POT file depends on all source files, each PO file depends on
the POT file, and each MO file depends on its corresponding PO file.

So, it would be "right" to implement Makefile dependencies in such a way that
each time a source file changes and the maintainer does a "make", the POT file
is being updated (via an 'xgettext' invocation), then the PO files are being
updated (via N 'msgmerge' invocations), then the MO files are being updated
(via N 'msgfmt' invocations). But this is too often:
- It takes too much time to rebuild _all_ these files after every little
change.
- The maintainer most often does not care about whether the translations are
up-to-date, because even if he runs "make install", he is not going to
start translation work.

So, the approach implemented in po/Makefile.in.in is that "make" does not
update all PO files, only "make dist" (which produces a tarball) does.
There is also a "make update-po" target which updates all PO files but does
not create a tarball. If there is a VCS, the maintainer is supposed to commit
the updated PO files when he makes and releases the tarball.

This was fine for cathedral style development, and until Automake came along.
In bazaar style development, there are more frequent releases, and committing
the updated PO files started to bloat the VCS history. Worse, Automake's
"make distcheck" becoming more popular, maintainers started to create tarballs
that were not really meant for use by translators. But the PO files were
being updated and increased the potential of VCS conflicts.

The minimalistic approach
-------------------------

It would be possible to never update the PO files, and instead produce the .mo
files by running 'msgmerge' on the fly, directly before 'msgfmt':
$ msgmerge xx.po domain.pot | msgfmt -c - > xx.mo
So:
- The POT file would be updated at "make dist",
- The PO files would only be changed when the translator submits a new one,
- The MO files would be updated at "make dist".
The VCS would only contain the PO files; and there would be no VCS conflicts.

The drawback with this approach is that translators cannot work with a PO
file that they take from a tarball; they would need to run 'msgmerge' by
themselves (if there is no TP robot that does it for them). This would be
a major hassle for the translators. Or they would need to rely on a web
service to deliver them the merged PO files - then the translators have a
methodology problem.

The inconsistent approach
-------------------------

This is a variation of the minimalistic approach: In the development tree,
never update the PO files. But implement the "make dist" target in such a
way that it puts updated PO files into the tarball.

Translators would be satisfied with this approach.

The drawback is that once a maintainer unpacks a tarball right after producing
it, its contents is different from what he has in his development tree. This
is not only surprising, it can also lead to bugs that appear only with the
release tarball and not earlier.

A radically different approach
------------------------------

It would be possible to store two PO files per language in a development tree:
- xx.po, the last translation received from the translator,
- xx.merged.po, the updated PO file, in sync with the latest POT file.
The VCS would only contain the xx.po files, not the xx.merged.po files. But
both sets of PO files would be present in the development tree and in the
tarballs.

Translators would be told through a README file that they need to pick the
xx.merged.po file, translate it, and if they want to test it, store it as
xx.po and do "make install".

The drawback here is increased disk space and tarball sizes. Personally I
don't think it matters much, but some people have strong opinions about it.

What do you think?

Bruno

[1]
http://git.savannah.gnu.org/gitweb/?p=automake.git;a=commitdiff;h=0465ed91200e0585c9e26974dc4551033a67623c

automake po / pot file integration: when to merge the PO files?

Reply via email to