[l10n-dev] TMX files (List of UI strings/translation)

Jean-Christophe Helary Tue, 27 Nov 2007 17:25:04 -0800


On 28 nov. 07, at 02:19, Rafaella Braconi wrote:

So far I have created tmx files and made them available 
at:http://wiki.services.openoffice.org/wiki/Translation_for_2.4#Translation_Memories
But if for your work you would like to get sdf (at least as long asFrench is not available in Pootle) just let me know.


Very good ! Thank you Rafaella.

For other list members' information, the TMX files include thefollowing languages:


 TMX_2007-09-12_de.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_es.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_fr.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_hu.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_it.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_ja.zip                        12-Sep-2007 13:20  3.3M
 TMX_2007-09-12_ko.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_nl.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_pl.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_pt-BR.zip                     12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_pt.zip                        12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_ru.zip                        12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_sv.zip                        12-Sep-2007 13:20  3.0M
 TMX_2007-09-12_zh-CN.zip                     12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_zh-TW.zip                     12-Sep-2007 13:20  3.1M

I've just checked the fr and ja packages and they include about 44,000"translation units" for the Help and about 25,000 TUs for the UI, inour case such units are basically Help files paragraphs or UI items.

They seem to be taken not directly from the XML files that constitutethe Help files but from the post-processed sdf files.


All this means:

1) The TMXs contain all the XML code escaped with "\" as per the sdffile: they are not "proper" TMX level2 files2) Since they conform to the sdf contents they can be used directly totranslate it (either in OpenLanguageTools or OmegaT)3) _But_ since the original XML code is also contained in thetranslation unit itself (instead of being encapsulated in TMX tags)there are chances that the matches will be influenced by the XML codeinstead of reflecting the translatable contents. Not only will thatlower the frequency of relevant matches but that will add to theburden of the translator since that requires editting the escaped XMLcode to get a proper match (which would be automatic with properencapsulation).4) people who use sentence segmentation in their tools should disableit and work with "paragraph segmentation" on so as to get the bestpossible matches from the TMXs.

Ideally, it would be preferable to translate directly from the XML(not from sdf) and to have the XML code properly encapsulated withinthe TMX to provide translators with the best matches possible and theeasiest way to recycle the XML code. If XLIFF is considered as aprefered format in the future (to replace sdf), I think that would beimportant to take into account proper encapsulation of the original XML.

I don't have much time right now, but if people are interested I couldmake a demonstration to show how much easier it would be fortranslators to have a "proper" localization format with "proper" TMXfiles.

Anyway, only the fact that real TMX files are available is a big pluscompared to the times when none were available. Thank you very muchRafaella for your efforts. I hope we will be able to make good use ofthe data, as well as to propose better workflows in the future.


Jean-Christophe Helary

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] TMX files (List of UI strings/translation)

Reply via email to