Re: [l10n-dev] Imagine :)
Hi Jean-Christophe, thank you once again for sharing your thoughts and experience. I am trying to reproduce and clarify with other engineers what you say here below. However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Regards, Rafaella Jean-Christophe Helary wrote: I have no idea where the UI files come from and how they _must_ be processed before reaching the state of l10n source files. So, let me give a very simplified view of the Help files preparation for l10n as seen from a pure TMX+TMX supporting tool point of view. Since I don't know what the internal processes really are I can only guess and I may be mistaken. • The original Help files are English HTML file sets. • Each localization has a set of files that corresponds to the English HTML sets • The English and localized versions are sync'ed To create TMX files: Use a process that aligns each block level tag in the English set to the corresponding block level tag in the localized set. That is called paragraph (or block) segmentation and that what SUN does for NetBeans: no intermediary file format, no .sdf, no .po, no whatever between the Help sets and the TMX sets. The newly updated English Help files come as sets of files, all HTML. The process to translate, after the original TMX conversion above (only _ONE_ conversion in the whole process) is the following: Load the source file sets and the TMX sets in the tool. The HTML tags are automatically handled by the tool. The already translated segments are automatically translated by the tool. The translator only needs to focus on what has been updated. Using the whole translation memory as reference. Once the translation is done, the translator delivers the full set that is integrated in the release after proofreading etc. What is required from the source files provided side ? Creating TMX from HTML paragraph sets. What is required from the translator ? No conversion whatsoever, just work with the files and automatically update the translation with the legacy data. Now, what do we have currently ? The source files provider creates a differential of the new vs the old HTML set. It converts the result to an intermediate format (.sdf) It converts that result to yet another intermediate format for the translator (either .po or xliff) It matches the results of the diff strings to corresponding old localized strings, thus removing the real context of the old string It creates a false TMX based on an already intermediate format, without hiding the internal codes (no TMX level 2, all the tag info is handled as text data...) The translator is left to use intermediate files that have been converted twice, removing most relation to the original format and adding the probability of having problems with the back conversion. It has to work with a false TMX that has none of the original context, thus producing false matches that need to be guessed backward and that displays internal codes as text data. Do you see where the overhead is ? It is very possible that the UI files do require some sort of intermediate conversion to provide the translators with a manageable set of files, but as far as the Help files are concerned (and as far as I understand the process at hand) there is absolutely no need whatsoever to use an intermediate conversion, to remove the original context and to force the translator to use error prone source files. It is important to find ways to simplify the system so that more people can contribute, so that the source files provider has less tasks to handle, but clearly using a .po based process to translate HTML files is going totally the opposite way. And translators are (sadly without being conscious of that) suffering from that, which results into less time spend on checking one's translation and a general overhead for checkers and converters. Don't get me wrong, I am not ranting or anything, I _am_ really trying to convince people here that things could (and should) be drastically simplified, and for people who have some time, I encourage you to see how NetBeans manages its localization process. Because we are loosing a _huge_ amount of human resources in the current process. Cheers, Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Op Donderdag 12-07-2007 om 10:36 uur [tijdzone +0200], schreef Rafaella Braconi: Hi Jean-Christophe, thank you once again for sharing your thoughts and experience. I am trying to reproduce and clarify with other engineers what you say here below. However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Regards, Rafaella Pootle has XLIFF functionality since version 1.0. Hopefully we can upgrade the version on the server soon. F - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
F Wolff wrote: Op Donderdag 12-07-2007 om 10:36 uur [tijdzone +0200], schreef Rafaella Braconi: Hi Jean-Christophe, thank you once again for sharing your thoughts and experience. I am trying to reproduce and clarify with other engineers what you say here below. However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Regards, Rafaella Pootle has XLIFF functionality since version 1.0. Hopefully we can upgrade the version on the server soon. That's really great news! Thank you for sharing this with us. Rafaella F - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
On 12 juil. 07, at 20:29, Jean-Christophe Helary wrote: On 12 juil. 07, at 17:36, Rafaella Braconi wrote: However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Yes, because the problem is not the delivery format, it is the fact that you have 2 conversions from the HTML to the final format and the conversion processes are not clean. Similarly, the TMX you produce are not real TMX (at least not the one you sent me). I am not arguing that UI files would benefit from such treatment. I am really focusing on the HTML documentation. To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. If translators want to use intermediary formats to translate HTML in their favorite tool (be it PO, XLIFF or anything else) that is their business. Janice (NetBeans) confirmed me that NB was considering a Pootle server exclusively for UI files (currently Java properties files), but in the end that would mean overhead anyway since the current process takes the Java properties as they are for translation in OmegaT. In NB, the HTML documentation is available in packages corresponding to the modules, and the TMX (a real one...) allows to automatically get only the updated segments. No need for a complex infrastructure to produce differentials of the files, all this is managed by the translation tool automatically and _that_ allows the translator to have _much more_ leverage from the context and to benefit from a much greater choice of correspondances. I suppose the overhead caused by the addition of an intermediary format for the UI files will be balanced by the management functions offered by the new system, but I wish we did not have to go through translating yet another intermediate format for the simple reason that seeing the existing conversion processes (I've tried only the translate-toolkit stuff and it was flawed enough to convince me _not_ to use its output) is likely to break the existing TMX. If the management system were evolved enough to output the same Java properties files I am sure everybody would be happy. But, please, no more conversion than necessary. To go back to the OOo processes, I have no doubt that a powerful management system available to the community is required. But in the end, why is there a need to produce .sdf files ? Why can't we simply have HTML sets, like the NB project, that we'd translate with appropriately formed TMX files in appropriate tools ? My understanding from when I worked with Sun Translation Editor (when we were delivered .xlz files and before STE was released as OLT) is that we had to use XLIFF _because_ the .sdf format was obscure. But in the end, the discussion we are having now after many years of running in circles apparently) revolves not on how to ease the translator's work but on how to ease the management. If the purpose of all this is to increase the translators' output quality, then it would be _much_ better to consider a similar system that uses the HTML sets directly. Because _that_ would allow the translator to spend much more time on checking the translation in commonly available tools (a web browser...) How do you do checks on PO/XLIFF/SDF without resorting to hacks ? Keeping things simple _is_ the way to go. Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Hi, Jean-Christophe Helary wrote: ... To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. ... let me add some words to your message: The application help files are XML files. See http://documentation.openoffice.org/online_help/techdetails.html for details. The Help Viewer converts the XML files to HTML when they are displayed. Using XML, it should be even more easy to use a straight forward translation process without intermediate files. Today we have a 1:1 correspondence of paragraphs as the smallest units to be translated. Each paragraph has an ID number to ensure the correct mapping of translated text. This means that no localization with additional or removed parts of text is possible. Not 21st. century technology in my opinion. We want to add a link to every help page to a corresponding Wiki page, where every user can add comments (or more). This will need some effort to re-sync the source files in CVS with the user contents from the Wiki. In all languages. Good ideas are welcome. Uwe -- [EMAIL PROTECTED] - Technical Writer StarOffice - Sun Microsystems, Inc. - Hamburg, Germany http://www.sun.com/staroffice http://documentation.openoffice.org/online_help/index.html http://wiki.services.openoffice.org/wiki/Category:OnlineHelp http://blogs.sun.com/oootnt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Jean-Christophe Helary wrote: On 13 juil. 07, at 00:07, Uwe Fischer wrote: Jean-Christophe Helary wrote: ... To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. ... let me add some words to your message: Uwe, Thank you so much for your mail ! The application help files are XML files. See http:// documentation.openoffice.org/online_help/techdetails.html for details. The Help Viewer converts the XML files to HTML when they are displayed. Using XML, it should be even more easy to use a straight forward translation process without intermediate files. That is very good to know. There are already free generic XML filters that produce valid XLIFF: Okapi framework for ex, developed by Yves Savourel, also editor of the XLIFF 1.0 spec. Okapi is developed in .NET 2.0 but I keep asking Yves to make it compatible with Mono so that it can be used in other environments. As a side note, OmegaT's XLIFF filter has been made to specifically support Okapi's output. Today we have a 1:1 correspondence of paragraphs as the smallest units to be translated. Each paragraph has an ID number to ensure the correct mapping of translated text. This means that no localization with additional or removed parts of text is possible. Not 21st. century technology in my opinion. No, but that means that correct TMX files are a possibility (even now). By the way I wonder why Rafaella told me creating TMXs of the state of the strings before the current updates was impossible ? to clarify: the only possibility I have is to provide you TMX files in which translation exactly matches the English text now. If the English source has been changed I have following situation: New English text - Old translation (matching previous text). In the database I have no possibility to provide you with files containing Old English text and Updated English text. Rafaella - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] TMX/XLIFF output (Re: [l10n-dev] Imagine :))
On 13 juil. 07, at 04:45, Rafaella Braconi wrote: No, but that means that correct TMX files are a possibility (even now). By the way I wonder why Rafaella told me creating TMXs of the state of the strings before the current updates was impossible ? to clarify: the only possibility I have is to provide you TMX files in which translation exactly matches the English text now. If the English source has been changed I have following situation: New English text - Old translation (matching previous text). In the database I have no possibility to provide you with files containing Old English text and Updated English text. Don't you have a snapshot of the doc _before_ it is modified ? I mean, I have the 2.2.1 help files on my machine, so I can use the XML files in, for ex, sbasic.jar in the EN folder and align them with the same files in the FR folder and create a valid TMX of the state of the 2.2.1 version. This is what I suggest you keep somewhere, for each language pair (with EN in source). So you have a static set of TMX, archived by module (sbasic, swriter, etc) for each language, available from the community web, and translators just get the TMX they need for their current assignment. Such files don't need to be dynamically generate,d they are valid for the most recent stable release, once the release is updated the files can be output for the translation of the next version. So, create the TMX _before_ you modify the data base, _or_ from static files that exist anyway inside any copy of OOo. And create TMX level2 files, with all the original XML encapsulated so as not to confuse CAT tools and translators. Regarding the output of proper source files, now that we (I...) know that the original is in XML, it should be trivial to provide them either directly as XML sets (specifically _without_ outputting diffs), or as XML diffs, or as XLIFFs. You may have some technical requirements that have you produce SDF files, but those only add an extra layer of complexity to the translation process and I am sure you could have a clean XML output that includes all the SDF contained meta info, so that the source file _is_ some kind of XML and not an hybrid that considers XML as text (which is the major source of confusion). If you have an XML workflow from the beginning, it should be much safer to keep it XML all the way hence: original = XML (the OOo dialect) diffs = XML (currently SDF, so shift to a dialect that uses the SDF info as attributes in XML diffs tags for ex) source = XML (XLIFF) reference = XML (TMX, taken from the original) TMX is not supported by most PO editors anyway, so a clean TMX would mostly benefit people who use appropriate translation tools (free ones included). Regarding the XLIFF (or PO, depending on the communities I gather) source output, each community (and even each contributor) could use the output that fits the tools in use. XLIFF should be 1.0 so as to ensure OLT can be used (OLT does not support more recent versions of XLIFF sadly). And then you have a clean workflow that satisfies everybody, and the management (Pootle) system can be put on all that to provide communities with the best environment possible. And of course, this workflow is also valid for UI strings, since I suppose they can also be converted to XML (if they are not already). What about that ? Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]