Hi Jean-Christophe,

thank you once again for sharing your thoughts and experience.

I am trying to reproduce and clarify with other engineers what you say here below.

However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this?

Regards,
Rafaella

Jean-Christophe Helary wrote:

I have no idea where the UI files come from and how they _must_ be processed before reaching the state of l10n source files.

So, let me give a very simplified view of the Help files preparation for l10n as seen from a "pure" TMX+TMX supporting tool point of view. Since I don't know what the internal processes really are I can only guess and I may be mistaken.

• The original Help files are English HTML file sets.
• Each localization has a set of files that corresponds to the English HTML sets
• The English and localized versions are sync'ed

To create TMX files:

Use a process that aligns each block level tag in the English set to the corresponding block level tag in the localized set. That is called paragraph (or block) segmentation and that what SUN does for NetBeans: no intermediary file format, no .sdf, no .po, no whatever between the Help sets and the TMX sets.

The newly updated English Help files come as sets of files, all HTML.

The process to translate, after the original TMX conversion above (only _ONE_ conversion in the whole process) is the following:

Load the source file sets and the TMX sets in the tool.

The HTML tags are automatically handled by the tool.
The already translated segments are automatically translated by the tool. The translator only needs to focus on what has been updated. Using the whole translation memory as reference.

Once the translation is done, the translator delivers the full set that is integrated in the release after proofreading etc.

What is required from the source files provided side ? Creating TMX from HTML paragraph sets.

What is required from the translator ? No conversion whatsoever, just work with the files and automatically update the translation with the legacy data.



Now, what do we have currently ?

The source files provider creates a differential of the new vs the old HTML set.
It converts the result to an intermediate format (.sdf)
It converts that result to yet another intermediate format for the translator (either .po or xliff) It matches the results of the diff strings to corresponding old localized strings, thus removing the real context of the old string It creates a false TMX based on an already intermediate format, without hiding the internal codes (no TMX level 2, all the tag info is handled as text data...)

The translator is left to use intermediate files that have been converted twice, removing most relation to the original format and adding the probability of having problems with the back conversion.

It has to work with a false TMX that has none of the original context, thus producing false matches that need to be guessed backward and that displays internal codes as text data.


Do you see where the overhead is ?



It is very possible that the UI files do require some sort of intermediate conversion to provide the translators with a manageable set of files, but as far as the Help files are concerned (and as far as I understand the process at hand) there is absolutely no need whatsoever to use an intermediate conversion, to remove the original context and to force the translator to use error prone source files.


It is important to find ways to simplify the system so that more people can contribute, so that the source files provider has less tasks to handle, but clearly using a .po based process to translate HTML files is going totally the opposite way. And translators are (sadly without being conscious of that) suffering from that, which results into less time spend on checking one's translation and a general overhead for checkers and converters.

Don't get me wrong, I am not ranting or anything, I _am_ really trying to convince people here that things could (and should) be drastically simplified, and for people who have some time, I encourage you to see how NetBeans manages its localization process. Because we are loosing a _huge_ amount of human resources in the current process.

Cheers,

Jean-Christophe Helary (fr team)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to