On 28 nov. 07, at 02:19, Rafaella Braconi wrote:

So far I have created tmx files and made them available 
at:http://wiki.services.openoffice.org/wiki/Translation_for_2.4#Translation_Memories

But if for your work you would like to get sdf (at least as long as French is not available in Pootle) just let me know.

Very good ! Thank you Rafaella.

For other list members' information, the TMX files include the following languages:

 TMX_2007-09-12_de.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_es.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_fr.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_hu.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_it.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_ja.zip                        12-Sep-2007 13:20  3.3M
 TMX_2007-09-12_ko.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_nl.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_pl.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_pt-BR.zip                     12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_pt.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_ru.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_sv.zip                        12-Sep-2007 13:20  3.0M
 TMX_2007-09-12_zh-CN.zip                     12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_zh-TW.zip                     12-Sep-2007 13:20  3.1M

I've just checked the fr and ja packages and they include about 44,000 "translation units" for the Help and about 25,000 TUs for the UI, in our case such units are basically Help files paragraphs or UI items.

They seem to be taken not directly from the XML files that constitute the Help files but from the post-processed sdf files.

All this means:

1) The TMXs contain all the XML code escaped with "\" as per the sdf file: they are not "proper" TMX level2 files 2) Since they conform to the sdf contents they can be used directly to translate it (either in OpenLanguageTools or OmegaT) 3) _But_ since the original XML code is also contained in the translation unit itself (instead of being encapsulated in TMX tags) there are chances that the matches will be influenced by the XML code instead of reflecting the translatable contents. Not only will that lower the frequency of relevant matches but that will add to the burden of the translator since that requires editting the escaped XML code to get a proper match (which would be automatic with proper encapsulation). 4) people who use sentence segmentation in their tools should disable it and work with "paragraph segmentation" on so as to get the best possible matches from the TMXs.

Ideally, it would be preferable to translate directly from the XML (not from sdf) and to have the XML code properly encapsulated within the TMX to provide translators with the best matches possible and the easiest way to recycle the XML code. If XLIFF is considered as a prefered format in the future (to replace sdf), I think that would be important to take into account proper encapsulation of the original XML.

I don't have much time right now, but if people are interested I could make a demonstration to show how much easier it would be for translators to have a "proper" localization format with "proper" TMX files.


Anyway, only the fact that real TMX files are available is a big plus compared to the times when none were available. Thank you very much Rafaella for your efforts. I hope we will be able to make good use of the data, as well as to propose better workflows in the future.

Jean-Christophe Helary

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to