Re: [l10n-dev] TM, glossaries and OmegaT

2007-06-15 Thread Alessandro Cattelan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
I ran a couple of test to see whether OmegaT could be used to translate
the OLH for OOo 2.3 and I forgot to post the results here as a follow-up
to this discussion (I've sent them to Sun and to the Italian community
only). I'm doing in it now just in case someone is interested in all this.

Apart from the test with OmegaT, I ran the same test with poEdit using
the same files and procedures.

At the end of this e-mail you'll find a report of what I've done, from
converting the original SDF file to PO, translating PO files with OmegaT
and poEdit and then back-converting the files to SDF. I'm not attaching
all the files and directories used since it would be too heavy - you can
download them from the following address:
http://tinyurl.com/2s4zwu

Ale.





####
## OmegaT ##
####




- - 
## Converting SDF into PO ##
- - 

[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls
HC2_93824_89_2007-06-05_39.sdf
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ oo2po
- - --source-language=en-US --language=it
- - --input=HC2_93824_89_2007-06-05_39.sdf --output=po
/usr/lib/python2.5/site-packages/translate/storage/po.py:31:
DeprecationWarning: The sre module is deprecated, please import re.
  import sre

oo2po: warning: Output directory does not exist. Attempting to create
processing 35 files...
/usr/lib/python2.5/site-packages/translate/storage/po.py:230:
UnicodeWarning: Unicode equal comparison failed to convert both
arguments to Unicode - interpreting them as being unequal
  if target == self.target:
[###] 100%


- - -
## Creating OmegaT Project ##
- - -

I've created a standard OmegaT project using the TMX you provided and a
glossary converted from a SunGloss exported glossary. I set en_US as the
source language and it as the target.

I translated the following files:
- - - po/helpcontent2/source/text/scalc.po
- - - po/helpcontent2/source/text/scalc/guide.po
- - - po/helpcontent2/source/text/shared/04.po


- - 
## Converting PO into SDF ##
- - 


[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls
backconversion  HC2_93824_89_2007-06-05_39.sdf  OLH-OmT-Project
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion/
HC2_93824_89_2007-06-05_39.sdf  po
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ cd backconversion/
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ ls
HC2_93824_89_2007-06-05_39.sdf  po
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ po2oo -t
HC2_93824_89_2007-06-05_39.sdf -l it po it_IT.sdf
/usr/lib/python2.5/site-packages/translate/storage/po.py:31:
DeprecationWarning: The sre module is deprecated, please import re.
  import sre
processing 35 files...
Error at
po/helpcontent2/source/text/shared/01.po::0211.xhp#par_id2366100.help.text:
escapes in original ('\n', '\emph\Replace', 'with\/emph\') don't
match escapes in translation ('\emph\Replace', 'with\/emph\')
Error at
po/helpcontent2/source/text/shared/01.po::0211.xhp#par_id9262672.help.text:
escapes in original ('\n', '\emph\Search', 'for\/emph\') don't match
escapes in translation ('\emph\Search', 'for\/emph\')
[###] 100%
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$
/opt/gsicheck_1.8.2/gsicheck it_IT.sdf
NO OUTPUT




####
## poEdit ##
####



- - --
## Converting SDF to PO ##
- - --

[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/poEdit$ ls
HC2_93824_89_2007-06-05_39.sdf
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/poEdit$ oo2po
- - --source-language=en-US --language=it
- - --input=HC2_93824_89_2007-06-05_39.sdf
- - --output=po/usr/lib/python2.5/site-packages/translate/storage/po.py:31:
DeprecationWarning: The sre module is deprecated, please import re.
  import sre

oo2po: warning: Output directory does not exist. Attempting to create
processing 35 files...
/usr/lib/python2.5/site-packages/translate/storage/po.py:230:
UnicodeWarning: Unicode equal comparison failed to convert both
arguments to Unicode - interpreting them as being unequal
  if target == self.target:
[###] 100%

- - -
## Translating with poEdit ##
- - -

With Catalogs Manager I created a project with all the OLH files.
I translated the following files (same as with OmegaT):
- - - po/helpcontent2/source/text/scalc.po
- - - po/helpcontent2/source/text/scalc/guide.po
- - - po/helpcontent2/source/text/shared/04.po


- - 
## Converting PO into SDF ##
- - 


Re: [l10n-dev] TM, glossaries and OmegaT

2007-06-15 Thread Jean-Christophe Helary


On 16 juin 07, at 04:48, Alessandro Cattelan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
I ran a couple of test to see whether OmegaT could be used to  
translate
the OLH for OOo 2.3 and I forgot to post the results here as a  
follow-up

to this discussion (I've sent them to Sun and to the Italian community
only). I'm doing in it now just in case someone is interested in  
all this.


Apart from the test with OmegaT, I ran the same test with poEdit using
the same files and procedures.

At the end of this e-mail you'll find a report of what I've done, from
converting the original SDF file to PO, translating PO files with  
OmegaT
and poEdit and then back-converting the files to SDF. I'm not  
attaching
all the files and directories used since it would be too heavy -  
you can

download them from the following address:
http://tinyurl.com/2s4zwu

Ale.


Ale,

I started my own OmegaT process yesterday and I roughly documented  
it on the fr-l10n list.


Basically what I did was the following:

1) get the .sdf and convert to one big .pot to make sure the  
automatically inserted translations were not present.


oo2po -P --language=fr --nonrecursiveinput  
HC2_93824_89_2007-06-05_33.sdf HC.pot


2) convert the .sdf to .po, convert that to .tmx, clean the TMX to  
remove parts where the original msgid and msgstr were identical (not  
necessary)


po2tmx --language=fr HC2_93824_89_2007-06-05_33.po HC.tmx

3) export the EN-FR glossary from SunGloss and keep source term/ 
target term/target comment, all separated by tabs.


I loaded all this in a dedicated OmegaT project (.pot in / 
source/, .tmx in /tm/, glossary renamed with .utf8 in /glossary/)


And what I get is the familiar OmegaT session illustrated here:

http://www.eskimo.com/~helary/_files/session_omegat.png



For those unfamiliar with it, the top left part is the editor window  
(the bold green bkg part is the source segment, to be translated  
right below between segment  and end segment


The bottom left part is the translation memory matching window. Since  
I use the original .sdf contents I always have at least a 100% match  
that I either us as is or edit (after rewriting in the edit field  
with Ctrl+R) I can select other matches (Ctrl+nb) for rewrite (Ctrl 
+R) or insertion at point (Ctrl+I)


The right part is the glossary part. Items cannot be inserted  
automatically, they are only for reference.


There are menus that are not displayed on the screenshot, from where  
one can create the target files (at any time during the translation)  
to check them, it is possible to modify the project segmentation at  
any time (regex based) etc.


The only worry I have is that the target file will have problems with  
the back convertion to .sdf but your testing seems to prove that  
those can be relatively easily fixed...


JC


- - 
## Converting PO into SDF ##
- - 


[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls
backconversion  HC2_93824_89_2007-06-05_39.sdf  OLH-OmT-Project
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion/
HC2_93824_89_2007-06-05_39.sdf  po
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ cd backconversion/
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ ls
HC2_93824_89_2007-06-05_39.sdf  po
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ po2oo -t
HC2_93824_89_2007-06-05_39.sdf -l it po it_IT.sdf
/usr/lib/python2.5/site-packages/translate/storage/po.py:31:
DeprecationWarning: The sre module is deprecated, please import re.
  import sre
processing 35 files...
Error at
po/helpcontent2/source/text/shared/01.po:: 
0211.xhp#par_id2366100.help.text:

escapes in original ('\n', '\emph\Replace', 'with\/emph\') don't
match escapes in translation ('\emph\Replace', 'with\/emph\')
Error at
po/helpcontent2/source/text/shared/01.po:: 
0211.xhp#par_id9262672.help.text:
escapes in original ('\n', '\emph\Search', 'for\/emph\') don't  
match

escapes in translation ('\emph\Search', 'for\/emph\')
[###] 100%
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$
/opt/gsicheck_1.8.2/gsicheck it_IT.sdf
NO OUTPUT


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [l10n-dev] TM, glossaries and OmegaT

2007-06-13 Thread Alessandro Cattelan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jean-Christophe Helary ha scritto:
 
 On 11 juin 07, at 22:07, Rafaella Braconi wrote:
 
 Hi Alexandro,

 for Russian Italian and Khmer we are referring to the Pootle server
 hosted on the Sun virtuallab. Please see details at:
 http://wiki.services.openoffice.org/wiki/New_Translation_Process_%28Pootle_server%29


 The 3 above languages are the only one which will be using Pootle to
 translate the 2.3 version since we are in the initial/pilot phase.
 If everything goes well or at least we sort our the issues and we are
 able to fix them, all other languages and team that want to be added
 to this tool are more than welcome to join. That would be for the 2.4
 release.
 
 If non-pootle users still want to use TMs it is possible to use OmegaT too.
 
 Sun can probably provide us with TMX files of previous translations in
 the relevant language pairs, the SunGloss contents can be exported as a
 glossary file and the source PO can be pretty much translated as is.
 
 The correct procedure would be:
 0) create a project in OmegaT
 1) correctly format the PO file with msgcat
 2) put that file in /source/
 3) make sure that your TMX has srclang set to your source language the
 way it was defined in the project settings
 4) put the TMX file in /tm/
 5) make sure the exported SunGloss is a tab separated list in at most 3
 columns (1st col= source language, 2nd col=target lang, 3rd col= comments)
 6) put the glossary in /glossary/
 7) open the project and translate
 8) when the translation is completed, msgcat the file in /target/ and
 deliver
 
 For info, OmegaT is a GPLed Java Computer Aided Translation tool
 developed for _translators_. It is specifically _not_ for geeks. Which
 means that it is relatively straight forward to use.
 
 I am pretty sure the filters can be tweaked to directly support the .sdf
 format but I leave that to others.
 
 I know some already use it here. NetBeans localizers also use it
 intensively. And real translators too :)
 
 Jean-Christophe Helary (OOo-fr)


I should have thought about that a couple of weeks ago, before going
around looking for info on PO editors and all the rest. I think I've
missed the point in which OmegaT started supporting PO files... :o(

OmegaT is certainly a great tool and it has proven extremely useful for
the Italian community in translating OOoAuthors.org documentation.

We are now working on the OLH translation with poEdit and most of the
translators are complaining about the change: for a translator OmegaT is
just way better than any PO editor.

I've tried doing what Jean-Cristophe suggested above and seen that it
could work. The only issue is that some TM matches will make no sense
because OmegaT take the tags into consideration while computing the
match percentage. For instance, for the following segment:

\\link href=\\\text/shared/01/online_update.xhp\Check for
Updates\\/link\\

OmegaT displayed this 60% match:

1) \\link href=\\\text/shared/01/0211.xhp\Navigator for
Master Documents\\/link\\
\\link href=\\\text/shared/01/0211.xhp\Navigatore per
documenti master\\/link\\
 60% 070108-it-for-mini-tm.tmx 

As you can see the only common word between the two is the preposition
FOR. It doesn't make much sense but it is certainly better than poEdit
which displays no TM matches at all, and most of the other matches are
indeed useful. I guess it depends on the quality of the TM we are using.

I feel that in this case working with a Po editor has one advantage: the
extracted strings Sun has provided us contain strings considered new and
changed. The changed strings contain the previous translation and work
therefore as a sort of TM. Here's an example:

msgid \\bookmark_value\\toolbars; Form Navigation
bar\\/bookmark_valuebookmark_value\\Navigation
bar;forms\\/bookmark_valuebookmark_value\\sorting; data in
forms\\/bookmark_valuebookmark_value\\data; sorting in
forms\\/bookmark_valuebookmark_value\\forms;sorting
data\\/bookmark_value\\
msgstr \\bookmark_value\\Barra dei simboli;barra di
navigazione\\/bookmark_valuebookmark_value\\Barra di
navigazione;formulari\\/bookmark_valuebookmark_value\\Ordinamento;dati
in formulari\\/bookmark_valuebookmark_value\\Dati;ordinamento
nei
formulari\\/bookmark_valuebookmark_value\\Formulario;ordinamento
dei dati\\/bookmark_value\\


If you look at the two carefully and speak a little Italian you can see
that the translation does not correspond to the original string as that
was changed during the development of OOo, but it is indeed very close.
Form Navigation bar is translated as Barra di navigazione whereas it
should be Barra di navigazione dei formulari

However, when translating the same segment with OmegaT, I get this 96%
match from the TM which in fact contain the same text as the msgstr
string in the PO file:

1) \\bookmark_value\\toolbars; Navigation
bar\\/bookmark_valuebookmark_value\\Navigation

Re: [l10n-dev] TM, glossaries and OmegaT

2007-06-13 Thread Jean-Christophe Helary

Ale,

If non-pootle users still want to use TMs it is possible to use  
OmegaT too.



I should have thought about that a couple of weeks ago, before going
around looking for info on PO editors and all the rest. I think I've
missed the point in which OmegaT started supporting PO files... :o(


Sincerely, I really wondered why you werre not considering it when  
you started asking your questions about PO files here and there :)



We are now working on the OLH translation with poEdit and most of the
translators are complaining about the change: for a translator  
OmegaT is

just way better than any PO editor.


Especially if you work intensively with TMs.


I've tried doing what Jean-Cristophe suggested above and seen that it
could work. The only issue is that some TM matches will make no sense
because OmegaT take the tags into consideration while computing the
match percentage. For instance, for the following segment:


Ok, the problem is that PO files are not supposed to contain XML  
strings :)


Hence the suggestion that a little tweaking of the existing filters  
would provide better matches with the original .sdf files.


But I've worked with OOo sdf-po converted files in the past and had  
no problem getting over this issue.


I have not yet checked the current files' contents but if it is more  
about text than links then you'd rather use OmegaT with your TMX files.




Given all this, I would say that OmegaT could be the solution here. At
least for the Italian community which is used to this tool and
appreciate its features. I'm going to give it a try. One of the things
we'll have to pay attention to is whether the translated file are
correct and can be imported painlessly into Sun database as an .sdf
file. I'll send Sun a few translated files to test this and will  
report

back as soon as the check is done.


This is my worry also. So you should give it a try first with a short  
file.


One thing is not clear, though: why should I need to run msgcat?  
Can't I

just work with a bunch of separated po files and directories in a tree
structure (basically what I get when I run oo2po on the .sdf file)?


msgcat was suggested by the original PO file developer. See if that  
works without it.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[l10n-dev] TM, glossaries and OmegaT

2007-06-12 Thread Jean-Christophe Helary


On 11 juin 07, at 22:07, Rafaella Braconi wrote:


Hi Alexandro,

for Russian Italian and Khmer we are referring to the Pootle server  
hosted on the Sun virtuallab. Please see details at:
http://wiki.services.openoffice.org/wiki/New_Translation_Process_% 
28Pootle_server%29


The 3 above languages are the only one which will be using Pootle  
to translate the 2.3 version since we are in the initial/pilot phase.
If everything goes well or at least we sort our the issues and we  
are able to fix them, all other languages and team that want to be  
added to this tool are more than welcome to join. That would be for  
the 2.4 release.


If non-pootle users still want to use TMs it is possible to use  
OmegaT too.


Sun can probably provide us with TMX files of previous translations  
in the relevant language pairs, the SunGloss contents can be exported  
as a glossary file and the source PO can be pretty much translated as  
is.


The correct procedure would be:
0) create a project in OmegaT
1) correctly format the PO file with msgcat
2) put that file in /source/
3) make sure that your TMX has srclang set to your source language  
the way it was defined in the project settings

4) put the TMX file in /tm/
5) make sure the exported SunGloss is a tab separated list in at most  
3 columns (1st col= source language, 2nd col=target lang, 3rd col=  
comments)

6) put the glossary in /glossary/
7) open the project and translate
8) when the translation is completed, msgcat the file in /target/ and  
deliver


For info, OmegaT is a GPLed Java Computer Aided Translation tool  
developed for _translators_. It is specifically _not_ for geeks.  
Which means that it is relatively straight forward to use.


I am pretty sure the filters can be tweaked to directly support  
the .sdf format but I leave that to others.


I know some already use it here. NetBeans localizers also use it  
intensively. And real translators too :)


Jean-Christophe Helary (OOo-fr)

http://sourceforge.net/projects/omegat/

CVS veersion:
cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/ 
omegat co -P omegat


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]