Re: [l10n-dev] TM, glossaries and OmegaT
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I ran a couple of test to see whether OmegaT could be used to translate the OLH for OOo 2.3 and I forgot to post the results here as a follow-up to this discussion (I've sent them to Sun and to the Italian community only). I'm doing in it now just in case someone is interested in all this. Apart from the test with OmegaT, I ran the same test with poEdit using the same files and procedures. At the end of this e-mail you'll find a report of what I've done, from converting the original SDF file to PO, translating PO files with OmegaT and poEdit and then back-converting the files to SDF. I'm not attaching all the files and directories used since it would be too heavy - you can download them from the following address: http://tinyurl.com/2s4zwu Ale. #### ## OmegaT ## #### - - ## Converting SDF into PO ## - - [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls HC2_93824_89_2007-06-05_39.sdf [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ oo2po - - --source-language=en-US --language=it - - --input=HC2_93824_89_2007-06-05_39.sdf --output=po /usr/lib/python2.5/site-packages/translate/storage/po.py:31: DeprecationWarning: The sre module is deprecated, please import re. import sre oo2po: warning: Output directory does not exist. Attempting to create processing 35 files... /usr/lib/python2.5/site-packages/translate/storage/po.py:230: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if target == self.target: [###] 100% - - - ## Creating OmegaT Project ## - - - I've created a standard OmegaT project using the TMX you provided and a glossary converted from a SunGloss exported glossary. I set en_US as the source language and it as the target. I translated the following files: - - - po/helpcontent2/source/text/scalc.po - - - po/helpcontent2/source/text/scalc/guide.po - - - po/helpcontent2/source/text/shared/04.po - - ## Converting PO into SDF ## - - [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion HC2_93824_89_2007-06-05_39.sdf OLH-OmT-Project [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion/ HC2_93824_89_2007-06-05_39.sdf po [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ cd backconversion/ [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ ls HC2_93824_89_2007-06-05_39.sdf po [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ po2oo -t HC2_93824_89_2007-06-05_39.sdf -l it po it_IT.sdf /usr/lib/python2.5/site-packages/translate/storage/po.py:31: DeprecationWarning: The sre module is deprecated, please import re. import sre processing 35 files... Error at po/helpcontent2/source/text/shared/01.po::0211.xhp#par_id2366100.help.text: escapes in original ('\n', '\emph\Replace', 'with\/emph\') don't match escapes in translation ('\emph\Replace', 'with\/emph\') Error at po/helpcontent2/source/text/shared/01.po::0211.xhp#par_id9262672.help.text: escapes in original ('\n', '\emph\Search', 'for\/emph\') don't match escapes in translation ('\emph\Search', 'for\/emph\') [###] 100% [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ /opt/gsicheck_1.8.2/gsicheck it_IT.sdf NO OUTPUT #### ## poEdit ## #### - - -- ## Converting SDF to PO ## - - -- [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/poEdit$ ls HC2_93824_89_2007-06-05_39.sdf [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/poEdit$ oo2po - - --source-language=en-US --language=it - - --input=HC2_93824_89_2007-06-05_39.sdf - - --output=po/usr/lib/python2.5/site-packages/translate/storage/po.py:31: DeprecationWarning: The sre module is deprecated, please import re. import sre oo2po: warning: Output directory does not exist. Attempting to create processing 35 files... /usr/lib/python2.5/site-packages/translate/storage/po.py:230: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if target == self.target: [###] 100% - - - ## Translating with poEdit ## - - - With Catalogs Manager I created a project with all the OLH files. I translated the following files (same as with OmegaT): - - - po/helpcontent2/source/text/scalc.po - - - po/helpcontent2/source/text/scalc/guide.po - - - po/helpcontent2/source/text/shared/04.po - - ## Converting PO into SDF ## - -
Re: [l10n-dev] TM, glossaries and OmegaT
On 16 juin 07, at 04:48, Alessandro Cattelan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I ran a couple of test to see whether OmegaT could be used to translate the OLH for OOo 2.3 and I forgot to post the results here as a follow-up to this discussion (I've sent them to Sun and to the Italian community only). I'm doing in it now just in case someone is interested in all this. Apart from the test with OmegaT, I ran the same test with poEdit using the same files and procedures. At the end of this e-mail you'll find a report of what I've done, from converting the original SDF file to PO, translating PO files with OmegaT and poEdit and then back-converting the files to SDF. I'm not attaching all the files and directories used since it would be too heavy - you can download them from the following address: http://tinyurl.com/2s4zwu Ale. Ale, I started my own OmegaT process yesterday and I roughly documented it on the fr-l10n list. Basically what I did was the following: 1) get the .sdf and convert to one big .pot to make sure the automatically inserted translations were not present. oo2po -P --language=fr --nonrecursiveinput HC2_93824_89_2007-06-05_33.sdf HC.pot 2) convert the .sdf to .po, convert that to .tmx, clean the TMX to remove parts where the original msgid and msgstr were identical (not necessary) po2tmx --language=fr HC2_93824_89_2007-06-05_33.po HC.tmx 3) export the EN-FR glossary from SunGloss and keep source term/ target term/target comment, all separated by tabs. I loaded all this in a dedicated OmegaT project (.pot in / source/, .tmx in /tm/, glossary renamed with .utf8 in /glossary/) And what I get is the familiar OmegaT session illustrated here: http://www.eskimo.com/~helary/_files/session_omegat.png For those unfamiliar with it, the top left part is the editor window (the bold green bkg part is the source segment, to be translated right below between segment and end segment The bottom left part is the translation memory matching window. Since I use the original .sdf contents I always have at least a 100% match that I either us as is or edit (after rewriting in the edit field with Ctrl+R) I can select other matches (Ctrl+nb) for rewrite (Ctrl +R) or insertion at point (Ctrl+I) The right part is the glossary part. Items cannot be inserted automatically, they are only for reference. There are menus that are not displayed on the screenshot, from where one can create the target files (at any time during the translation) to check them, it is possible to modify the project segmentation at any time (regex based) etc. The only worry I have is that the target file will have problems with the back convertion to .sdf but your testing seems to prove that those can be relatively easily fixed... JC - - ## Converting PO into SDF ## - - [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion HC2_93824_89_2007-06-05_39.sdf OLH-OmT-Project [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion/ HC2_93824_89_2007-06-05_39.sdf po [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ cd backconversion/ [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ ls HC2_93824_89_2007-06-05_39.sdf po [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ po2oo -t HC2_93824_89_2007-06-05_39.sdf -l it po it_IT.sdf /usr/lib/python2.5/site-packages/translate/storage/po.py:31: DeprecationWarning: The sre module is deprecated, please import re. import sre processing 35 files... Error at po/helpcontent2/source/text/shared/01.po:: 0211.xhp#par_id2366100.help.text: escapes in original ('\n', '\emph\Replace', 'with\/emph\') don't match escapes in translation ('\emph\Replace', 'with\/emph\') Error at po/helpcontent2/source/text/shared/01.po:: 0211.xhp#par_id9262672.help.text: escapes in original ('\n', '\emph\Search', 'for\/emph\') don't match escapes in translation ('\emph\Search', 'for\/emph\') [###] 100% [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ /opt/gsicheck_1.8.2/gsicheck it_IT.sdf NO OUTPUT - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TM, glossaries and OmegaT
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jean-Christophe Helary ha scritto: On 11 juin 07, at 22:07, Rafaella Braconi wrote: Hi Alexandro, for Russian Italian and Khmer we are referring to the Pootle server hosted on the Sun virtuallab. Please see details at: http://wiki.services.openoffice.org/wiki/New_Translation_Process_%28Pootle_server%29 The 3 above languages are the only one which will be using Pootle to translate the 2.3 version since we are in the initial/pilot phase. If everything goes well or at least we sort our the issues and we are able to fix them, all other languages and team that want to be added to this tool are more than welcome to join. That would be for the 2.4 release. If non-pootle users still want to use TMs it is possible to use OmegaT too. Sun can probably provide us with TMX files of previous translations in the relevant language pairs, the SunGloss contents can be exported as a glossary file and the source PO can be pretty much translated as is. The correct procedure would be: 0) create a project in OmegaT 1) correctly format the PO file with msgcat 2) put that file in /source/ 3) make sure that your TMX has srclang set to your source language the way it was defined in the project settings 4) put the TMX file in /tm/ 5) make sure the exported SunGloss is a tab separated list in at most 3 columns (1st col= source language, 2nd col=target lang, 3rd col= comments) 6) put the glossary in /glossary/ 7) open the project and translate 8) when the translation is completed, msgcat the file in /target/ and deliver For info, OmegaT is a GPLed Java Computer Aided Translation tool developed for _translators_. It is specifically _not_ for geeks. Which means that it is relatively straight forward to use. I am pretty sure the filters can be tweaked to directly support the .sdf format but I leave that to others. I know some already use it here. NetBeans localizers also use it intensively. And real translators too :) Jean-Christophe Helary (OOo-fr) I should have thought about that a couple of weeks ago, before going around looking for info on PO editors and all the rest. I think I've missed the point in which OmegaT started supporting PO files... :o( OmegaT is certainly a great tool and it has proven extremely useful for the Italian community in translating OOoAuthors.org documentation. We are now working on the OLH translation with poEdit and most of the translators are complaining about the change: for a translator OmegaT is just way better than any PO editor. I've tried doing what Jean-Cristophe suggested above and seen that it could work. The only issue is that some TM matches will make no sense because OmegaT take the tags into consideration while computing the match percentage. For instance, for the following segment: \\link href=\\\text/shared/01/online_update.xhp\Check for Updates\\/link\\ OmegaT displayed this 60% match: 1) \\link href=\\\text/shared/01/0211.xhp\Navigator for Master Documents\\/link\\ \\link href=\\\text/shared/01/0211.xhp\Navigatore per documenti master\\/link\\ 60% 070108-it-for-mini-tm.tmx As you can see the only common word between the two is the preposition FOR. It doesn't make much sense but it is certainly better than poEdit which displays no TM matches at all, and most of the other matches are indeed useful. I guess it depends on the quality of the TM we are using. I feel that in this case working with a Po editor has one advantage: the extracted strings Sun has provided us contain strings considered new and changed. The changed strings contain the previous translation and work therefore as a sort of TM. Here's an example: msgid \\bookmark_value\\toolbars; Form Navigation bar\\/bookmark_valuebookmark_value\\Navigation bar;forms\\/bookmark_valuebookmark_value\\sorting; data in forms\\/bookmark_valuebookmark_value\\data; sorting in forms\\/bookmark_valuebookmark_value\\forms;sorting data\\/bookmark_value\\ msgstr \\bookmark_value\\Barra dei simboli;barra di navigazione\\/bookmark_valuebookmark_value\\Barra di navigazione;formulari\\/bookmark_valuebookmark_value\\Ordinamento;dati in formulari\\/bookmark_valuebookmark_value\\Dati;ordinamento nei formulari\\/bookmark_valuebookmark_value\\Formulario;ordinamento dei dati\\/bookmark_value\\ If you look at the two carefully and speak a little Italian you can see that the translation does not correspond to the original string as that was changed during the development of OOo, but it is indeed very close. Form Navigation bar is translated as Barra di navigazione whereas it should be Barra di navigazione dei formulari However, when translating the same segment with OmegaT, I get this 96% match from the TM which in fact contain the same text as the msgstr string in the PO file: 1) \\bookmark_value\\toolbars; Navigation bar\\/bookmark_valuebookmark_value\\Navigation
Re: [l10n-dev] TM, glossaries and OmegaT
Ale, If non-pootle users still want to use TMs it is possible to use OmegaT too. I should have thought about that a couple of weeks ago, before going around looking for info on PO editors and all the rest. I think I've missed the point in which OmegaT started supporting PO files... :o( Sincerely, I really wondered why you werre not considering it when you started asking your questions about PO files here and there :) We are now working on the OLH translation with poEdit and most of the translators are complaining about the change: for a translator OmegaT is just way better than any PO editor. Especially if you work intensively with TMs. I've tried doing what Jean-Cristophe suggested above and seen that it could work. The only issue is that some TM matches will make no sense because OmegaT take the tags into consideration while computing the match percentage. For instance, for the following segment: Ok, the problem is that PO files are not supposed to contain XML strings :) Hence the suggestion that a little tweaking of the existing filters would provide better matches with the original .sdf files. But I've worked with OOo sdf-po converted files in the past and had no problem getting over this issue. I have not yet checked the current files' contents but if it is more about text than links then you'd rather use OmegaT with your TMX files. Given all this, I would say that OmegaT could be the solution here. At least for the Italian community which is used to this tool and appreciate its features. I'm going to give it a try. One of the things we'll have to pay attention to is whether the translated file are correct and can be imported painlessly into Sun database as an .sdf file. I'll send Sun a few translated files to test this and will report back as soon as the check is done. This is my worry also. So you should give it a try first with a short file. One thing is not clear, though: why should I need to run msgcat? Can't I just work with a bunch of separated po files and directories in a tree structure (basically what I get when I run oo2po on the .sdf file)? msgcat was suggested by the original PO file developer. See if that works without it. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] TM, glossaries and OmegaT
On 11 juin 07, at 22:07, Rafaella Braconi wrote: Hi Alexandro, for Russian Italian and Khmer we are referring to the Pootle server hosted on the Sun virtuallab. Please see details at: http://wiki.services.openoffice.org/wiki/New_Translation_Process_% 28Pootle_server%29 The 3 above languages are the only one which will be using Pootle to translate the 2.3 version since we are in the initial/pilot phase. If everything goes well or at least we sort our the issues and we are able to fix them, all other languages and team that want to be added to this tool are more than welcome to join. That would be for the 2.4 release. If non-pootle users still want to use TMs it is possible to use OmegaT too. Sun can probably provide us with TMX files of previous translations in the relevant language pairs, the SunGloss contents can be exported as a glossary file and the source PO can be pretty much translated as is. The correct procedure would be: 0) create a project in OmegaT 1) correctly format the PO file with msgcat 2) put that file in /source/ 3) make sure that your TMX has srclang set to your source language the way it was defined in the project settings 4) put the TMX file in /tm/ 5) make sure the exported SunGloss is a tab separated list in at most 3 columns (1st col= source language, 2nd col=target lang, 3rd col= comments) 6) put the glossary in /glossary/ 7) open the project and translate 8) when the translation is completed, msgcat the file in /target/ and deliver For info, OmegaT is a GPLed Java Computer Aided Translation tool developed for _translators_. It is specifically _not_ for geeks. Which means that it is relatively straight forward to use. I am pretty sure the filters can be tweaked to directly support the .sdf format but I leave that to others. I know some already use it here. NetBeans localizers also use it intensively. And real translators too :) Jean-Christophe Helary (OOo-fr) http://sourceforge.net/projects/omegat/ CVS veersion: cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/ omegat co -P omegat - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]