People who want to make use of the TMX available here:
http://ooo.services.openoffice.org/pub/OpenOffice.org/cws/upload/localization/tmx21/
Have noticed that their structure does not match the structure of the
PO files that are output by the oo2po utility.
The reason is that the TMX are created directly from the existing SDF
files (or something close enough) while the PO files add an extra
layer of escape characters (\) to the SDF contents:
For example:
*String in the SDF (and in the TMX):
\ahelp hid=\.\ visibility=\hidden\\something\/ahelp\
*The same string in the PO will become:
\\ahelp hid=\\\.\\\ visibility=\\\hidden\something\\/ahelp
\\
So, when people use the TMX to match the contents of the PO file, they
have to manually add all the extra \. A process that is very much
error prone. Besides for the fact that only a few PO editors (OmegaT
only ?) can make real use of TMX files...
The solution is to directly work from the SDF files.
But since their structure is a little complex, it would be easier to
extract the translatable contents first, translate it is a TMX
supporting tool (OmegaT/OpenLanguageTools etc) and merge the
translation to the SDF to deliver an error-free file.
People who still work with PO files in PO _editors_ don't really know
what it is like to translate with _real_ translation tools, and I hope
to convince them that the PO based processes that are used for OOo's
localization are quickly becoming obsolete.
If you want to have fun translating while maintaining a professional
level to your work, it is time to consider using tools that are
created for translation... :)
A few days before this round started, I mentioned that Alex Buloichik
had created a utility that extracted and merged the translatable
contents of a SDF file. I tested it on the French file set (in a trial
and error process, apologies to Sophie for the weird intermediary
files...) and now it is ready to be used by the OOo localization
community.
The tool is hosted here:
http://alex73.zaval.org/snapshots/OpenOffice/sdf2txt.jar
The source code is included in the Jar, and the license is GPL.
This tool is mostly for team coordinators: they use it to split the
SDF in its module parts and to extract the translatable contents to a
simple key=value text file.
The syntax is as follows:
For store sdf messages to text file:
java -jar sdf2txt.jar --extract source-sdf-file-name source-
lang output-dir
For create sdf file with translated messages:
java -jar sdf2txt.jar --merge source-sdf-file-name source-
lang input-dir target-sdf-file-name language
Examples:
java -jar sdf2txt.jar --extract en-US.sdf en-US data/
java -jar sdf2txt.jar --merge en-US.sdf en-US data/ be-BY.sdf be-BY
The output creates a folder architecture with the module names and the
word count appended to each name. A summary is also output in a
count.log file (file name, line number, word count).
For example, the file that contains the strings of the original
[res_DataLabel_tmpl.hrc] will be found in the following folder
structure:
[extraction folder]/chart2-84/source-84/controller-84/dialogs-84/
res_DataLabel_tmpl.hrc.utf8.ini
The structure of the file to translate is:
checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_CATEGORY=Show ~category
checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_SYMBOL=Show ~legend key
checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_VALUE_AS_NUMBER=Show
value as ~number
checkbox.RESOURCE_DATALABEL( xpos,
ypos ).CB_VALUE_AS_PERCENTAGE=Show value as ~percentage
fixedtext.RESOURCE_DATALABEL( xpos,
ypos ).FT_LABEL_PLACEMENT=Place~ment
pushbutton.RESOURCE_DATALABEL( xpos, ypos ).PB_NUMBERFORMAT=Number
~format...
pushbutton.RESOURCE_DATALABEL( xpos,
ypos ).PB_PERCENT_NUMBERFORMAT=Percentage f~ormat...
stringlist.WORKAROUND.1=Best fit
stringlist.WORKAROUND.10=Top right
stringlist.WORKAROUND.11=Inside
stringlist.WORKAROUND.12=Outside
stringlist.WORKAROUND.13=Near origin
stringlist.WORKAROUND.2=Center
stringlist.WORKAROUND.3=Above
stringlist.WORKAROUND.4=Top left
stringlist.WORKAROUND.5=Left
stringlist.WORKAROUND.6=Bottom left
stringlist.WORKAROUND.7=Below
stringlist.WORKAROUND.8=Bottom right
stringlist.WORKAROUND.9=Right
instead of the escapedly ugly PO format.
Once the extraction is complete, the translation coordinator divides
the extracted package between the translators.
The translators can use OmegaT to translate the files.
- put the files in /source/
- put the TMX files in /tm/
- put the glossary files (Sun Gloss) in /glossary/
- reload the project
Make sure that the files are handled as UTF-8 in the File Filters
option.
This time, the TMX will perfectly match the contents of the
translatable files and there won't be any need to manually add \.
For your information, I did the French UI with sdf2txt and OmegaT and
the results were the following:
-about half of the 400 segments were already in the TMX