from:"Jean\-Christophe Helary"

Re: [l10n-dev] OpenCTI use?

2010-07-20 Thread Jean-Christophe Helary

Hello Reiko,

 Could you or anyone let me know how you want to
 use TBX format of glossary ?
 
 I believe OmegaT supports csv format in
 glossary feature.

That is correct. Test versions of OmegaT support normal TSV files (.utf8 or 
.txt), CSV files (.csv) and TBX files (.tbx).

Supported fields are source term/target term/comment but I am not sure how they 
are labelled in TBX.

 Could anyone let me know why TBX is desireble, rather than csv in your 
 translation ?

My guess is that TBX allows the glossary maintainer to have a finer control on 
the glossary contents  by using all the allowed categories and labels, and to 
let the tools do their parsing according to their ability.


Jean-Christophe Helary

fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en  fr)
tweets: http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] OpenCTI use?

2010-07-18 Thread Jean-Christophe Helary


On 19 juil. 10, at 06:23, Elsa Blume wrote:

 Hi Sophie,
 
 Getting back to you on this topic!
 And gathering info about the import/export option in TBX format to help you 
 update the TD.
 Could you please tell me which are the tools/editors the Community wants to 
 use to work with TBX format?

OmegaT or Virtaal both support TBX.

Jean-Christophe Helary

fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en  fr)
tweets: http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] OpenCTI use?

2010-06-17 Thread Jean-Christophe Helary

Rafaella,

On 17 juin 10, at 21:49, Rafaella Braconi wrote:

 Hi Sophie,
 
 Is it planed to migrate OpenOffice.org glossaries from SunGloss to OpenCTI?
 no. There are no plans to migrate OOo glossaries from SunGloss to OpenCTI.

On January 27, Reiko sent a mail in Japanese to a number of lists with the 
following title:

 Date: 27 janvier 2010 10:48:45 UTC+09:00
 To: undisclosed-recipients: ;
 Subject: [ja-translate] SunGloss migration

I translated that mail for the French lists.

It was indicated in the mail that SunGloss was going to be read-only from 
January 31st and that the glossaries would be available from the Terminology 
tab in OpenCTI.

What is the status of the migration Reiko mentioned ?


Jean-Christophe Helary

fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en  fr)
tweets: http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

[l10n-dev] Re: [ja-translate] No coordinator for translation to Japanese and SUN or Oracle's position

2010-05-19 Thread Jean-Christophe Helary

Dear Maho,

What is the current state of the team ? Can you give specifics ?

I seem to remember that there were more than one translator.

Jean-Christophe

On 19 mai 10, at 16:34, Maho NAKATA wrote:

 Dear Saito Reiko-san and Rafaella,
 
 I would like to ask you about 3.3 translation to Japanese.
 
 Background.
 As I told, Kubota-san has been resigned and no dupty and successor.
 I don't know how we will translate 3.3 strings for Japanese.
 I - as the JA project lead - cannot accept, without leadership or without 
 someone who takes
 the responsibility. Otherwise, our project will crash again.
 I'll look for the next coordinator, but I'm not sure we will find to 3.3 
 transation.
 
 Question.
 So here is the question. What SUN or Oracle will/can offer for 3.3 JA 
 translation?
 
 Thanks   
 -- Nakata Maho

Jean-Christophe Helary

fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en  fr)
tweets: http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] Oracle Open Office supports 17 languages :)

2010-04-21 Thread Jean-Christophe Helary


On 21 avr. 10, at 16:38, Kazunari Hirano wrote:

 On this page, try to change Store Country from United States to other 
 country.
 You can not find Japan!  Unbelievable!

And in the other countries, those who do _not_ have a shop but are on the list 
are:

Austria
Belgium
Denmark
Finland
France
Germany
Netherlands
Norway
South Africa
Spain
Sweden
Switzerland

Plus if you go to the shop sites the site is not fully localized...

 Ivo san, can you urge Oracle to improve the Oracle Open Office site
 and help a Japanese who want to buy Japanese Oracle Open Office in yen
 :)  please.

And urge it to have shops in main European countries (like France and Germany) 
that are at the forefront of OOo adoption in the world...


Jean-Christophe Helary

fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en  fr)
tweets: http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] Open Language Tools XLIFF Editor version 1.3.1 has been released

2010-03-15 Thread Jean-Christophe Helary


On 16 mars 10, at 03:05, André Schnabel wrote:

 For all who already tested the release candidate: you do not need to download 
 the editor again. The release version is exactly the same as the RC.

Hasn't the manual been slightly updated ?



Jean-Christophe Helary

fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en  fr)
tweets: http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] Major L10n achievements

2009-10-22 Thread Jean-Christophe Helary



On 23 oct. 2009, at 01:41, André Schnabel wrote:


Rafaella Braconi schrieb:


If during the time November 2008 and October 2009, you, your team  
has reached any major milestone or anything that you feel it's  
important to mention, please send a short email indicating language  
or teams involved and a short description of what you have achieved.


There have no big bang achievemnts for the Germanophone team  
within the last year - at least if you just look at the results.


I'd like to mention a little big bang related to the German team.

André's efforts on the Open Language Tool XLIFF editor now allows the  
translation community at large (beyond OpenOffice.org or any FLOSS  
localization team) to have a free software based XLIFF editor.


Before that, translators had to work with closed source XLIFF editors  
or used relatively non trivial conversion paths (Okapi/Rainbow+OmegaT).


Now, if their client wants them to work on XLIFF files, they can do so  
quite easily in one step: open it in OLT and work.


That is a very good example of how FLOSS communities can provide  
professionals with amazing tools and I'd hope that will allow  
professional translators to participate more to FLOSS localization  
projects. I think that is an important challenge for FLOSS l10n  
communities.


Thank you André for your recent work on OLT, and thank you to Tim  
Foster and all the others at Sun who first brought us this tool  
originally (STE, for the old timers here).





Jean-Christophe Helary (JA/EN  FR)

http://mac4translators.blogspot.com/
http://twitter.com/brandelune
http://www.doublet.jp


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] No source to translate ?

2009-08-23 Thread Jean-Christophe Helary



On Aug 24, 2009, at 4:52 AM, Sophie wrote:

And yes, I will also have to make my terminology project appear in  
Pootle.

Been pushing that forward for some time now 


I don't know what is your operating system, but there is several  
tools that allow you to translate like OmegaT, OTE, and I use PoEdit  
and its TM. I/we find it more convenient than Pootle, even if there  
is still some ehnancement to bring to these tools.


OmegaT, OTE and PoEdit are available on OSX/Windows/Linux (I've  
suggested a few lines to Andre for the OTE manual so that OSX users  
easily figure out how to make it start).





Jean-Christophe Helary


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] How's Developer's Guide Pootle going?

2009-08-11 Thread Jean-Christophe Helary



On Aug 12, 2009, at 10:51 AM, Aijin Kim wrote:


Our members are interested in the format of the files for shortcut.
Would be they PO ?, XLIFF or others ?

The format will be XLIFF. I believe you'll be familiar with the  
format soon. :)


Is the XLIFF based on the original XML contents or is it based on  
SDF ? If it were based on the original XML, the tags would be very  
easy to work with in XLIFF supporting CAT tools. The SDF data converts  
all to text strings and breaks all the tag support found in modern  
applications.


André's latest work on OLT is making XLIFF editing very easy even  
though the software does not seem to be super stable (I had 2 freezes  
last night working with a real world XLIFF file).


It is also possible to work with OmegaT and Rainbow (both GPL/Java),  
the process is much more robust.








Jean-Christophe Helary


http://mac4translators.blogspot.com/
http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] Strange tags in .po file

2009-08-10 Thread Jean-Christophe Helary



On Aug 10, 2009, at 8:26 PM, Eike Rathke wrote:


I somewhat doubt that CRs would be used in dialogs. Usually line feed
characters (0x10, #16;) are used instead. I'm not familiar with that
extension though, should be clarified with the code owner 'mav'.


But the fact is that #13; is a CR. It may originate from a string  
pasted from a Mac file into the code, or from something totally  
different...






Jean-Christophe Helary


http://mac4translators.blogspot.com/
http://twitter.com/brandelune


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] Strange tags in .po file

2009-08-08 Thread Jean-Christophe Helary



On Aug 8, 2009, at 11:37 PM, Sophie wrote:


Hi all,

In swext/mediawiki/src/registry/data/org/openoffice/Office/ 
Custom.po, there is strange tags like: #13;#13
A wiki article with the title '$ARG1' already exists.#13;#13;Do  
you want to replace the current article with your article?#13;#13;


Is it normal tags?


They seem to be carriage returns. CR. The line ending character for  
Mac files.





Jean-Christophe Helary


-
To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org
For additional commands, e-mail: dev-h...@l10n.openoffice.org

Re: [l10n-dev] Proposal: create pootle-translation-method mailing list

2008-03-13 Thread Jean-Christophe Helary



On 13 mars 08, at 15:15, Pavel Janík wrote:


Repeat after me: [...]


I am not sure this is the proper way to address fellow list members.

Why can't you accept that your proposal was not worded well enough to  
gather enough support ?






Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] POOTLE: Content update on March 13th

2008-03-12 Thread Jean-Christophe Helary



On 12 mars 08, at 21:26, André Schnabel wrote:

The workflow is quite the same as we used to have with sdf-files and  
OTE. So we have no real benefits, changed the tools (what means we  
need to learn new tools) and lost a lot of time experimenting.


Same here. I can see that some teams feel their workflow is improved  
with pootle/PO files and translate-toolkit magik but it is not the  
case for a number of others.


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Proposal: create pootle-translation-method mailing list

2008-03-09 Thread Jean-Christophe Helary



On 9 mars 08, at 16:15, Rail Aliev wrote:

Thus I'd like to propose the setup of special [EMAIL PROTECTED]  
mailing list.


The purpose of which would be ?

dev@l10n.openoffice.org
[EMAIL PROTECTED]
[EMAIL PROTECTED]

What would be the distinctive use of each one of those lists ?





Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Proposal: create pootle-translation-method mailing list

2008-03-09 Thread Jean-Christophe Helary


dev@l10n.openoffice.org


General purpose L10N list. Here we discuss everything about L10N.


[EMAIL PROTECTED]
[EMAIL PROTECTED]


These list can be merged as one list where we can discuss  
translation specific

things.


Could you make a list of the recent threads and classify them in  
either category so that the purpose of each list is clearer ? Because  
to me, everything about localization includes translation.






Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Proposal: create pootle-translation-method mailing list

2008-03-07 Thread Jean-Christophe Helary



On 7 mars 08, at 17:42, Pavel Janík wrote:

I think that would only create confusion since some issues are  
inter-related.


Do you have an example of such issue?


Discussing Pootle in conjunction with OmegaT.


If you mean a _technical_ list where pootle-dev questions exclusively  
are discussed then why not, but that is not clear from your proposal.  
Plus I don't think that is OT on this _DEV_ list.


Personally, I think what we rather need is a list for translators/l10n  
managers where they can discuss practical issues and a dev-only list  
where the technical issues (eventually pootle related) are discussed.


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Proposal: create pootle-translation-method mailing list

2008-03-07 Thread Jean-Christophe Helary



On 7 mars 08, at 18:28, Pavel Janík wrote:
Or you mean how to use OmegaT in connection with Sun's Pootle  
instance to translate OOo?


Of course that is what I mean. We discuss OOo localization here don't  
we ?


But yes, we should at least start to think about splitting general  
l10n and translation related stuff...


But the thing is that we don't have any translation discussion here.  
We have discussions about processes. Either processes on SUN side or  
processes on team sides.


I don't think there is a clear cut between both.





Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] For teams that use OmegaT

2008-03-05 Thread Jean-Christophe Helary

The Italian team l10n leader asked me a number of questions offlist  
regarding the workflow, I replied with the French and Japanese groups  
in Cc since those are the groups I participate to.


The mail is here:
http://ja.openoffice.org/servlets/ReadMsg?list=translatemsgNo=3419

Also, the OmegaT Project has just released a test version of OmegaT  
1.8 that comes with spellchecking and a number of other important new  
features. It is called test because the manual is not up to date and  
because there are a few areas that need some ironing out but I've been  
using it since its first branching in CVS and I've had no data loss  
problem at all.


I wrote something about the whole thing here:
http://mac4translators.blogspot.com/2008/03/omegat-173-18-19.html

1.8 is a major improvement because it at last comes with spellchecking  
(hunspell, with the dictionaries that OOo uses). I encourage all the  
teams that use OmegaT to work with the test version.




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: problem strings in OmegaT

2008-03-03 Thread Jean-Christophe Helary


Aijin,


Thanks for your comments. Yes, I agree that it'd be the best way to
switch the style from msgid_comment to msgctxt. I also confirmed that
msgctxt works ok in OmegaT.


I did not have the time to check. What do you mean by msgctxt works  
ok in OmegaT ? Is it displayed separately as a comment or is it  
simply ignored ?


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: problem strings in OmegaT

2008-03-03 Thread Jean-Christophe Helary


Aijin,

Ok, very good. It would have been a very good surprise to see msgctxt  
appear somewhere though ;)


Also, OmegaT 1.8 with a spellchecker (hunspell, works with OOo  
dictionaries) has been released in test version yesterday:


http://mac4translators.blogspot.com/2008/03/omegat-173-18-19.html

JC

On 3 mars 08, at 17:56, Aijin Kim wrote:


Hi JC,

In OmegeT 1.7.3, msgctxt seems to be simply ignored. There is no  
display for msgctxt. What I meant was that OmegaT works ok with  
'msgid' and 'msgstr' fields regardless of msgctxt field.


Regards,
Aijin


Jean-Christophe Helary 쓴 글:

Aijin,


Thanks for your comments. Yes, I agree that it'd be the best way to
switch the style from msgid_comment to msgctxt. I also confirmed  
that

msgctxt works ok in OmegaT.


I did not have the time to check. What do you mean by msgctxt  
works ok in OmegaT ? Is it displayed separately as a comment or is  
it simply ignored ?




Jean-Christophe Helary
K.K. DOUBLET


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: problem strings in OmegaT

2008-02-29 Thread Jean-Christophe Helary



On 29 févr. 08, at 17:02, Aijin Kim wrote:


Hi JC,

Thanks a lot for your kind explanation.
So you mean that you manually delete the msgid_comment part from  
each target string?
If so, it should be better that source string doesn't include the  
msgid_comment line in source string to avoid additional work, right?


That is correct.

Now, I'm thinking if we need to use msgctxt style. Ain has confirmed  
that poedit supports it. I'm not sure about OmegaT.
If OmegaT also supports msgctxt, it'd be good to change the format  
of po files from next update.


OmegaT will ignore its contents. It only sees msgid.

JC



Jean-Christophe Helary 쓴 글:


On 29 févr. 08, at 15:20, Aijin Kim wrote:


Hi JC,

I guess what Ain mentioned was that 'msgctxt' option during oo2po  
saves the comment line in another field rather that adding to  
msgid fileld. Then there won't be no change with msgid string.


So for current po files, do you simply ignore the comment line in  
your translation?


As far as OmegaT is concerned, yes. But OmegaT is even weirder than  
that :)


Basically, OmegaT has been conceived for translating documents,  
monolingual documents. Not for working with intermediate  
localization formats.


Basically it works that way:
• It first parses the file, keeps the structure (skeleton) part in  
memory and puts all the translatable strings to the display.
• The translator goes through segments one by one and types the  
translation by also referring to the available translation memories  
and glossaries.
• When the translator wants to see the result, the translated  
files are build by using the skeleton in memory and by filling in  
with the translated strings. Anything that has not been translated  
is left with the source values.


The problem with PO or XLIFF etc, it is that the skeleton of the  
file has placeholders already for source and target. Which means  
that OmegaT should read what it sees in source, consider what is  
already in target and put the translation in target if necessary.  
PO includes in itself sort of a TM function by adding fuzzy  
strings and by keeping the whole legacy translation in itself. In  
OmegaT this TM part is handled totally separately because  
monolingual documents are not supposed to come with such embedded  
data, at least not in the current CAT world.


In the case of PO files, it needs to have empty msgstr so that it  
can pretend to work as for a normal monolingual document by  
considering exclusively the contents of msgid, and even if the  
msgstr is not empty it just ignores its contents (future  
developments are aiming at putting that contents automatically in  
TM):


The process is then: parse what is in msgid, display for  
translation, and _rewrite_ the whole file with msgid=msgstr for  
places that have not been translated yet...


Which is the reason why OmegaT is perfect for HTML, ODF and  
whatever is monolingual and works on a _document_ basis (cf the  
NetBeans l10n process), but not so good for intermediate or pre- 
processed formats (like the OOo and other PO based l10n processes).


Eventually, the dev team will work on the issue of intermediate  
formats. But right now OmegaT will work best with proper msgid  
and empty msgstr, with all the legacy contents put in TMX or  
glossaries. That is what OmegaT is good at handling :)


JC




Thanks,
Aijin

Jean-Christophe Helary wrote:

Hi all,
Some strings of po files have a line which was added to make  
each

msgid string be unique using --duplicates=msgid_comment option
during executing oo2po.
http://translate.sourceforge.net/wiki/toolkit/
duplicates_duplicatestyle

How can OmegaT or poedit handle the added line?


Since the string is a msgid it is handled as a source string  
and is

displayed as translatable.



... thats one reason we switched to msgctxt comment style, where
identifier is stored in separate field. See also
http://vagula.blogspot.com/2008/02/attention-to-community-translators.html


Ain,

Sorry I don't understand your comment. What Aijin asked is how  
does OmegaT (and POedit, which I don't use) handle tweaked msgid.


My reply was that it handles them as normal msgid. I did not  
see a reference to msgctxt.



Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Jean-Christophe Helary
K.K. DOUBLET


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands

Re: [l10n-dev] Re: problem strings in OmegaT

2008-02-28 Thread Jean-Christophe Helary



On 29 févr. 08, at 12:50, Aijin Kim wrote:


Hi Dick,
I don't have much experience with OmegaT. I think it's the best way  
to ask community's help on this. :)


Hi all,
Some strings of po files have a line  which was added to make each  
msgid string be unique using --duplicates=msgid_comment option  
during executing oo2po.
http://translate.sourceforge.net/wiki/toolkit/ 
duplicates_duplicatestyle


How can OmegaT or poedit handle the added line? Is there any way to  
hide the line from source string or can we simply ignore the line  
when translating?
I can see Pootle handles the line and hide it from source string for  
online translation. But not sure in terms of offline editors.


Since the string is a msgid it is handled as a source string and is  
displayed as translatable.




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: problem strings in OmegaT

2008-02-28 Thread Jean-Christophe Helary


Hi all,
Some strings of po files have a line  which was added to make each
msgid string be unique using --duplicates=msgid_comment option
during executing oo2po.
http://translate.sourceforge.net/wiki/toolkit/
duplicates_duplicatestyle

How can OmegaT or poedit handle the added line?


Since the string is a msgid it is handled as a source string and is
displayed as translatable.



... thats one reason we switched to msgctxt comment style, where
identifier is stored in separate field. See also
http://vagula.blogspot.com/2008/02/attention-to-community-translators.html


Ain,

Sorry I don't understand your comment. What Aijin asked is how does  
OmegaT (and POedit, which I don't use) handle tweaked msgid.


My reply was that it handles them as normal msgid. I did not see a  
reference to msgctxt.



Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: problem strings in OmegaT

2008-02-28 Thread Jean-Christophe Helary



On 29 févr. 08, at 15:20, Aijin Kim wrote:


Hi JC,

I guess what Ain mentioned was that 'msgctxt' option during oo2po  
saves the comment line in another field rather that adding to msgid  
fileld. Then there won't be no change with msgid string.


So for current po files, do you simply ignore the comment line in  
your translation?


As far as OmegaT is concerned, yes. But OmegaT is even weirder than  
that :)


Basically, OmegaT has been conceived for translating documents,  
monolingual documents. Not for working with intermediate localization  
formats.


Basically it works that way:
• It first parses the file, keeps the structure (skeleton) part in  
memory and puts all the translatable strings to the display.
• The translator goes through segments one by one and types the  
translation by also referring to the available translation memories  
and glossaries.
• When the translator wants to see the result, the translated files  
are build by using the skeleton in memory and by filling in with the  
translated strings. Anything that has not been translated is left with  
the source values.


The problem with PO or XLIFF etc, it is that the skeleton of the  
file has placeholders already for source and target. Which means that  
OmegaT should read what it sees in source, consider what is already in  
target and put the translation in target if necessary. PO includes in  
itself sort of a TM function by adding fuzzy strings and by  
keeping the whole legacy translation in itself. In OmegaT this TM part  
is handled totally separately because monolingual documents are not  
supposed to come with such embedded data, at least not in the current  
CAT world.


In the case of PO files, it needs to have empty msgstr so that it can  
pretend to work as for a normal monolingual document by considering  
exclusively the contents of msgid, and even if the msgstr is not empty  
it just ignores its contents (future developments are aiming at  
putting that contents automatically in TM):


The process is then: parse what is in msgid, display for translation,  
and _rewrite_ the whole file with msgid=msgstr for places that have  
not been translated yet...


Which is the reason why OmegaT is perfect for HTML, ODF and whatever  
is monolingual and works on a _document_ basis (cf the NetBeans l10n  
process), but not so good for intermediate or pre-processed formats  
(like the OOo and other PO based l10n processes).


Eventually, the dev team will work on the issue of intermediate  
formats. But right now OmegaT will work best with proper msgid and  
empty msgstr, with all the legacy contents put in TMX or glossaries.  
That is what OmegaT is good at handling :)


JC




Thanks,
Aijin

Jean-Christophe Helary wrote:

Hi all,
Some strings of po files have a line  which was added to make each
msgid string be unique using --duplicates=msgid_comment option
during executing oo2po.
http://translate.sourceforge.net/wiki/toolkit/
duplicates_duplicatestyle

How can OmegaT or poedit handle the added line?


Since the string is a msgid it is handled as a source string and is
displayed as translatable.



... thats one reason we switched to msgctxt comment style, where
identifier is stored in separate field. See also
http://vagula.blogspot.com/2008/02/attention-to-community-translators.html


Ain,

Sorry I don't understand your comment. What Aijin asked is how does  
OmegaT (and POedit, which I don't use) handle tweaked msgid.


My reply was that it handles them as normal msgid. I did not see  
a reference to msgctxt.



Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Jean-Christophe Helary
K.K. DOUBLET


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Translatable contents extraction ?

2008-02-25 Thread Jean-Christophe Helary


Thank you very much Friedel.

Is there a simple tool that can extract the translation data and  
later

merge the translated data ?



pofilter and pomerge will help you do this. In fact, if you send your
translations right back to Pootle, you can just upload the translated
subsets when you upload (as long as you don't choose overwrite when
you download. The default behaviour should be merge, which is what  
you

want.



Reiko, I think we have a solution now :)

We can do an extraction on the PO files, translate the text in OmegaT  
with the TMX Rafaela provided us with and check that within OmegaT,  
then merge the translated files to the original package :)


JC

On 25 févr. 08, at 16:41, F Wolff wrote:


Op Maandag 2008-02-25 skryf Jean-Christophe Helary:

Is it possible to only have the PO parts that need translation/
updating and not the whole set ?

All the already translated parts are irrelevant to the translation
itself (except when used as translation memories).

Is there a simple tool that can extract the translation data and  
later

merge the translated data ?



pofilter and pomerge will help you do this. In fact, if you send your
translations right back to Pootle, you can just upload the translated
subsets when you upload (as long as you don't choose overwrite when
you download. The default behaviour should be merge, which is what  
you

want.

http://translate.sourceforge.net/wiki/toolkit/pofilter
http://translate.sourceforge.net/wiki/toolkit/pomerge

You can download a ZIP file of all the PO files in the project/ 
directory

where you want to do this. You are interested in pofilter
--test=untranslated, but the page above will give more information on
the command line use.

Friedel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] Re: [ja-translate] Re: [l10n-dev] How can we review with Pootle ?

2008-02-24 Thread Jean-Christophe Helary


Reiko,


If the number of fuzzy is small, we can work on Pootle
directly, can't we ?


That is what I would suggest.


If the update volume is high, such as in HC,
we can use [TM] mark to be inserted to the leverage
from TMX, right ?  Is there any way to do the opposite,
taht is, mark new translation ?


It is possible to insert it manually. But what I propose is an  
automatic insertion when OmegaT recognizes a 100% match.



If there's any mark put on the new translation,
we can search that segment with that mark.


I understand. It would be indeed very convenient :) Especially since  
searches in OmegaT cover both source and target without distinction ...



The ideal is that we get only the segments to translate or update, not  
the whole package. That is a waste of resources and requires useless  
roundtrip manipulations...


I suggest we extract all the non translated segments before starting  
the translations. That would make all the manipulations above  
irrelevant.


JC


Even if there's no such a way, your workaround
will be a big help.  Thank you again for your help!

Regards,

-Reiko


Jean-Christophe Helary wrote:

On 18 févr. 08, at 18:48, Jean-Christophe Helary wrote:
Let me confirm.  I understand the new/fuzzy is identified on  
OmegaT,

but once the translator did the translation and put the translated
string to the untranslated segment, how the reviewer can
recognize which one is the strings to review ?


Reiko,

PO is not exactly the strong point of OmegaT :)

I'll check tonight with a PO from Pootle and will get back to you  
later. Maybe on the ja list ?

Reiko,
I have just tried OmegaT with Localization.po from javainstaller2.
The file is translated at 97% and contains only 2 fuzzies to check.
The conclusion is that OmegaT is useless for files that mostly  
contain translated and fuzzy strings. Ideally, a source file should  
not contain such strings and all the reference should be stored in  
a TMX. The fuzzies should be left empty for normal translation.
If you work with a file that is mostly untranslated and where the  
reference parts are clearly separated from the source, it is  
trivial to set OmegaT to insert the TM reference with a prefix to  
distinguish it from the Translator's input.
Just set OmegaT to automatically insert 100% matches with a [TM]  
prefix, or anything you want. The translator will still be in  
control of the process and will be able to do modifications to the  
input if necessary.
When the reviewer checks the file, only the parts that are not  
marked with [TM] will have to be checked.

I understand that this is not an ideal workflow though...
Jean-Christophe Helary

http://mac4translators.blogspot.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--

Reiko Saito
Japanese Language Lead
Translation Language and Information Services (TLIS)
Globalization Services
Sun Microsystems, Inc.
Email: [EMAIL PROTECTED]
Phone: +81 3 5962 4912
Blog: http://blogs.sun.com/reiko


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] Translatable contents extraction ?

2008-02-24 Thread Jean-Christophe Helary

Is it possible to only have the PO parts that need translation/ 
updating and not the whole set ?


All the already translated parts are irrelevant to the translation  
itself (except when used as translation memories).


Is there a simple tool that can extract the translation data and later  
merge the translated data ?




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] How can we review with Pootle ?

2008-02-22 Thread Jean-Christophe Helary



On 18 févr. 08, at 18:48, Jean-Christophe Helary wrote:


Let me confirm.  I understand the new/fuzzy is identified on OmegaT,
but once the translator did the translation and put the translated
string to the untranslated segment, how the reviewer can
recognize which one is the strings to review ?


Reiko,

PO is not exactly the strong point of OmegaT :)

I'll check tonight with a PO from Pootle and will get back to you  
later. Maybe on the ja list ?


Reiko,

I have just tried OmegaT with Localization.po from javainstaller2.

The file is translated at 97% and contains only 2 fuzzies to check.

The conclusion is that OmegaT is useless for files that mostly contain  
translated and fuzzy strings. Ideally, a source file should not  
contain such strings and all the reference should be stored in a TMX.  
The fuzzies should be left empty for normal translation.


If you work with a file that is mostly untranslated and where the  
reference parts are clearly separated from the source, it is trivial  
to set OmegaT to insert the TM reference with a prefix to distinguish  
it from the Translator's input.


Just set OmegaT to automatically insert 100% matches with a [TM]  
prefix, or anything you want. The translator will still be in control  
of the process and will be able to do modifications to the input if  
necessary.


When the reviewer checks the file, only the parts that are not marked  
with [TM] will have to be checked.


I understand that this is not an ideal workflow though...





Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] How can we review with Pootle ?

2008-02-18 Thread Jean-Christophe Helary


Suppose the translator downloads the file, translate,
and upload them as suggested to Pootle,

1. Is there any way to accept all suggested sgements
  in a single step on Pootle?


Reiko,


2. If we review the files off-line, how can we identify
  the new translation on OmegaT ?


It is not trivial.

The best way to work with OmegaT is to have untranslated files  
(without fuzzies, those are handled separately by the TM matching  
process), to translate them and to review them within OmegaT, or in a  
plain text|PO editor.




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] How can we review with Pootle ?

2008-02-18 Thread Jean-Christophe Helary


Let me confirm.  I understand the new/fuzzy is identified on OmegaT,
but once the translator did the translation and put the translated
string to the untranslated segment, how the reviewer can
recognize which one is the strings to review ?


Reiko,

PO is not exactly the strong point of OmegaT :)

I'll check tonight with a PO from Pootle and will get back to you  
later. Maybe on the ja list ?


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] How can we review with Pootle ?

2008-02-18 Thread Jean-Christophe Helary



On 19 févr. 08, at 11:38, Reiko Saito wrote:


Hi JC,

 I'll check tonight with a PO from Pootle and will get back to you  
later.

 Maybe on the ja list ?

Thanks!
That will be great.


Sorry, I was busy last night... I'll do that later today.


I am curious how French community is reviewing the translation ?


Sophie and Elsa will be able to reply, I am only a translator :)


How are you identifying the segments for review ?
I don't think you are reading all of the segments, but focus
on the newly tranlsated ones, right ?

I understand Pootle shows Suggested translation, but if there are
many segments, you are working off-line and upload them to Pootle,
I assume.


What we have done so far is that I translated the UI in Pootle because  
there were few segments and since I had forgotten (I am not used yet  
to the tool) that one could set the suggested flag instead of  
committing the translation Sophie had indeed to check all the UI  
strings...


We agreed to work with Suggested in a next batch.


If you are reviewing them only on Pootle, are you accepting
the new translation one by one ?  It seems to take time.


For the Help files, I think Sophie translated everything offline and I  
don't know how she managed the review (yet).


We still are in the process of adapting ourselves to the new tool.

As far as I see it, if one is used to OmegaT, shifting to Pootle is  
not that convenient. File assignment management can be made otherwise  
etc.


The next big batch of untranslated files will be, I guess, fully  
handled offline, review included. But we still have to discuss that.


Sophie, Elsa, any comment ?


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] TMs for 3.0

2008-02-12 Thread Jean-Christophe Helary



On 13 févr. 08, at 14:16, Clytie Siddall wrote:

In response to our requests, the latest version of the Translate  
Toolkit actually has _less_ escaping than the SDF file. It replaces  
the extra escaping when you convert to SDF.


Sure, but the TMX that Rafaella delivered have _no_ escaping problems  
whatsoever.


It is the SDF-PO-TMX conversion that causes the incompatibilities  
that have been previously mentioned. The fact that the current SDF-PO- 
TMX is less bad at delivering  properly escaped (or not) files is  
not really relevant.


Also, the fact that TMX sets have different escape rules depending on  
the converter version defeats the purpose of TMXs... Do we need to  
reconvert megabytes of file sets every time there is a new version of  
TT ?



Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] TMs for 3.0

2008-02-07 Thread Jean-Christophe Helary



On 8 févr. 08, at 00:41, F Wolff wrote:


Furthermore, I think it is important to note that these TMX files does
not follow the same unescaping rules of the new conversions done by
Translate Toolkit 1.1.  Of course, TMX files corresponding to the new
unescaping rules can be generated from the set of PO files with po2tmx
from the Translate Toolkit:
http://translate.sourceforge.net/wiki/toolkit/po2tmx


Because they have been created directly from the SDF files and not  
from the PO files.


The translate toolkit is vastly over escaping strings. If you need a  
reference to see how proper escaping should be done, take a look at  
the PO files from the Debian distribution.


It would be nice if TT could comply with already established rules of  
the industry and not re-invent the wheel at each new release.


ps: do you also meant that TMX created with TT 1.1 are not compatible  
with TMX created with previous TT versions ? Why do you think you have  
the right to break our work because you suddenly decided to change  
your specs ?


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OpenOffice.org 3.0 - Translation Schedule

2008-02-04 Thread Jean-Christophe Helary


Aijin, Rafaella,

On 5 févr. 08, at 12:58, Aijin Kim wrote:


Sorry, you even don't have to merge the po files.
I.e. the step to create tmx are:
1. download po files from Pootle
2. run po2tmx


Isn't it possible to have the TMX available directly from SUN as they  
were for 2.4 ?



Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] Who is in charge of Pootle's FR localization ?

2008-02-03 Thread Jean-Christophe Helary

I found a number of problems in the FR strings of Pootle. How is it  
possible to correct them ?






Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Who is in charge of Pootle's FR localization ?

2008-02-03 Thread Jean-Christophe Helary


Aijin,

Eventually I modified the original Pootle and was told that the  
modifications would be published in the next release.


JC

On 4 févr. 08, at 11:18, Aijin Kim wrote:

Pootle project is default for localized UI of Pootle. The  
modifications in Sunvirtuallab Pootle will not be applied to  
officially released Pootle.


Aijin

Jean-Christophe Helary 쓴 글:


On 3 févr. 08, at 22:17, Aijin Kim wrote:


Hi Jean-Christophe,

Sunvirtuallab Pootle only hosts OpenOffice.org and OpenSolaris.org.


Aijin,

On my project list I have:


Projects
OpenOffice.org HC2, OpenOffice.org UI, OpenSolaris.org, Pootle,  
Terminology




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] Pootle issues

2008-02-02 Thread Jean-Christophe Helary



On 3 févr. 08, at 12:35, Aijin Kim wrote:


Hi Pootle user,
When you have a trouble during online translation, please let me  
know your status so that I can check out what's the problem.


Aijin,

When I have a number of untranslated segments in the PO file, it is  
possible to set Pootle to only display those ? I could not find how to  
mark them in a way that would instandly catch the eye.


It is the same for approximations, although they seem to have a grey  
vertical bar in fron of them.


Also, I found that the numbers given from the top page (nb of  
untranslated/approximates) did not match the real numbers I found when  
actually opening the files.


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Pootle issues

2008-02-02 Thread Jean-Christophe Helary

On 3 févr. 08, at 14:23, Aijin Kim wrote:

If you select 'Show Editing Function' in a project page, e.g. http://sunvirtuallab.com:32300/fr/helpcontent2/
, you can see 'Quick Translation' link. It only shows untranslated
and fuzzy strings one by one. You can click 'skip' button to go to
next string. However, there is no way to list them in one page.

Thank you Aijin for this hint (btw, the French translation of the
Pootle interface is not of very good quality. Where can modifications
be proposed ?)

I think the Show editing function item should be present on all
pages so that the user does not have to go back and forth between the
current PO file page and the project top page.

I am trying that now but it seems the whole project is displayed then,
not a PO file per PO file filtering. Is that what you meant by there
is no way to list them in one page ?

First, please make sure that the numbers on the graph of a project
are number of words, not strings.

Ok, now that I think of it that must be the answer.

Now there is a UI issue then.

Whan I look at:
http://www.sunvirtuallab.com:32300/fr/openoffice_org/

I see Non-traduit (non translated) = 84, but it is not immediately
clear that 84 is a string number. By looking at that page, I see the
first row = translated=1 so I suppose those are string numbers.

But, when I go to svtools/source/ I see that non translated is at 25
for misc.po but when I check misc.po the file header block says 9
strings, so the assumption that the listed values were strings and
not words was wrong from the beginning.

It seems to me both the word/string could should always be present
anywhere there is a count displayed to ensure that there is no
confusion possible.

Jean-Christophe Helary

http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] search function in Pootle

2008-01-30 Thread Jean-Christophe Helary



On 30 janv. 08, at 22:54, Olivier Hallot wrote:


Hi

I want to locate a specific string (say Gallery) in the po files  
tree, but I only get the first occurence and it seems that there is  
no other way to find the others.

Any hint?


Download the file and do the search offline.


Jean-Christophe Helary


http://mac4translators.blogspot.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Formats, tools, and workflow

2008-01-02 Thread Jean-Christophe Helary


Friedel,

Thank you for the very comprehensive reply.



tags are escaped, and yes, if somebody does the work, going directly
from the XML help files to translation formats, could provide some
benefits.


Could you point where the files are ?


meta-information by means of the x-comment information. oo2po from the
translate toolkit will add those notes to the PO file (and oo2xliff  
will

add it to note tags of XLIFF files).


Have you considered the context XLIFF tag ?



We also have a converter that goes directly from SDF to XLIFF. It
shipped with the current version of the toolkit, although a packaging
bug might hide it for some users. The packaging bug will be fixed in  
the

next versions of the toolkit.


When do you plan to release it ?



can therefore be seen as being similar to compiling to binary  
format. We
store our localisation formats in a version control system, and that  
is
considered to be the stored translations. This way we also don't  
need to

retranslate with a TM at the start of version update such as the
method is with OmegaT (according to my understanding).


Well, another way to look at OmegaT is to consider it as a CVS  
specialised in translated strings.


There is no need to retranlate with a TM in the case of OOo since we  
only get the non translated strings as source. NetBeans does that  
differently. They release _all_ the strings so it looks more like what  
you describe.




OLT with XLIFF files:
About OLT not being able to open our XLIFF files: our XLIFF files are
well formed as far as we know - please report any bugs to our mailing
list or bugzilla. We have validated some of our XLIFF files  
according to

the XLIFF DTD, so I would be surprised if they are truly malformed.


OLT supports only XLIFF 1.0. From what I heard, OLT does not use an  
XML parsing library to do that but has it all hardcoded. Which means  
that support for more recent versions of XLIFF requires a lot of work.


A way to work around that would be to provide SDF filtering for OLT  
directly.




Claims of mismatches between PO and TMX files:
My understanding is that this error is reported by users of OmegaT.


Not exclusively. People who manually edit the files have to add the  
escapes missing in the TMX as provided by SUN.


Besides, it is not a claim, it is a fact that the data contents of  
the PO provided by coordinators who use oo2po and of the TMX provided  
by SUN are not equivalent.


The claim is yours when you write below that such mismatching  
should be properly interpreted.




It is also my understanding that OmegaT doesn't actually interpret PO
files, but only contains functionality to identify / highlight the
different parts of the PO file for translation. I salute the great  
work
of the OmegaT community, but if the tool doesn't understand the  
format,

the PO/TMX tools can't take the blame for it. To see the PO and TMX
files working well, I suggest people try using a TMX file with pot2po
(either the TMX file provided by Sun, or one created with po2tmx from
the translate toolkit).


Is there an official documentation regarding the PO format ?

The GNU pages do not refer to anything to interpret as far as escape  
sequences are concerned. The only formal reference there is to find is  
to C syntax for character strings, which means the escapes.


But I could not find any part of the GNU gettext manual that says how  
PO parsing tools should interpret the format, except as textual data.


Can you give me links to recommanded implementation practices for PO  
tools ?


I guess the point of Translation Memory _eXchange_ is the point that  
it
should be exchangeable regardless of what it will be used for. If  
tools

interpret the escaping differently, that does pose a problem and that
will need to be addressed.


TMX is XML and is parsed with XML parsers.

However, if the issue is that OmegaT is translating PO files as text  
files without regard for the PO file format, I don't think we can  
lay the blame on the TMX specification or something else.


Are there PO parsers provided by GNU or GPLed that OmegaT could use to  
improve its parsing of PO files ?




Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] gsicheck for Mac Intel ready for download

2007-12-28 Thread Jean-Christophe Helary



On 29 déc. 07, at 14:58, Clytie Siddall wrote:

Sorry, I seem to be messing this up somehow. I unpacked the  
directory, but I can't call the executable, whether I put the  
directory in Applications or /usr/bin .


I just get -bash: gsicheck: command not found. :(

My PATH is pretty comprehensive.


Clytie,

I just tried it on my Mac and did not seem to have the problem you  
describe. I use a /bin/ directory in my home folder, the directory is  
in my PATH and gsicheck was called correctly by bash.




Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] PO and TMX contents do not match, help !

2007-12-27 Thread Jean-Christophe Helary


Javier,

I am glad we at last managed to agree on the most important:

if I generate TMX form PO, y should use it with POs, and if I  
generate from XLIFF, I should use it for XLIFF...


Yes, and if we generate TMX from SDF (like SUN's TMX) then it is  
supposed to work with SDF, which is the reason why I proposed a way to  
work with SDF directly.



and then it works.


It does indeed. If communities want to work with the TMX that SUN  
provides then they can use the workflow I proposed and they'll see  
wonders.


I am afraid that at this point we do not have such a thing as  
correct/universal TMX files.


Agreed, TMX depend on the original contents. And so it should be match  
with the format in which the original contents is expressed.


... and that there is no truth on this, just opinions and systems  
that work.


100% with you.


Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Open Language Tool

2007-12-27 Thread Jean-Christophe Helary


Mechtilde,

There are explanations for OLT localizers on OLT's page:
https://open-language-tools.dev.java.net/editor//xliff-editor-l10n.html

You can also join their developer's list to ask technical questions  
related to the localization.

https://open-language-tools.dev.java.net/servlets/ProjectMailingListList

On 27 déc. 07, at 23:49, Mechtilde wrote:


Hello,

after the second translation round I have had the idea to translate  
also

the translation tool.



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] PO and TMX contents do not match, help !

2007-12-26 Thread Jean-Christophe Helary


Thank you for the reality check Alessandro.

Any other community willing to share experiences ? I would really like  
to know what are the commonly accepted best practices for the current  
PO based workflow ?


I'd really like to know myself how people translate with the current  
workflow as I feel we're missing something.


In the Italian community we're currently translating most of our  
files directly on Pootle which may be considered a good translation  
workflow management system but a very poor translation editor.

So far we've tried different solutions:
- we downloaded the PO files and tried to translate them with OmegaT  
but we had problems with the TMX matching and with the reconversion  
to SDF (gsicheck errors);


That is correct. The PO and the TMX do not match so the translators  
must be extra careful when re-using contents from the TMX, basically  
that means adding manually all the extra \ that PO has added.


- we extracted XLIFF files from Pootle and tried to translate them  
with the OLT Editor but the tool didn't even open them as it  
considered the XLIFF files not well formed;


No comment here.

- we converted the PO files using the OLT Filters, it worked, but  
then it proved so slow in handling the TM that we had to give up on  
that;


Here, the idea would be to have the OLT filters directly handle the  
SDF format, but I fear that would not change much for the overall  
performance. Unless the TMX files were trimmed down a little bit  
maybe. Like having separate TMX files per module (which would shrink  
them to the ~k segments each I suppose, instead of the 50k+20k chunks  
that we have now).


- we translated some of our content with poEdit but that editor is  
as poor as Pootle from this point of view (no TM and no glossary).


That is correct.

I have tried to install Kbabel on OSX yesterday and I see that it has  
limited TMX support, but had no time to check further. Plus, the TMX  
contents and the PO contents not matching we would have problems  
similar to work with OmegaT I suppose.


So far I find the method for translating SDF files proposed by Jean- 
Cristophe the best way to work on the translation but it seems to be  
not compatible with Pootle which we are using as well. What we, as  
translators, really need is a method to translate effectively using  
TM and glossaries just like we do in the professional world. OmegaT  
would have it all: a glossary extracted from SunGloss can easily be  
converted for the tool and the OmegaT TM engine works very well...  
but then, obviously, we need a TM that matches the content to be  
translated.


Which is why the solution I proposed based on SDF is the best in my  
opinion.


Regarding Pootle, it is possible to upload the result after the  
translation is completed ? If yes, you could translate based on SDF,  
convert the result with oo2po and upload that to Pootle to ensure your  
data is properly managed there ?



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] PO and TMX contents do not match, help !

2007-12-26 Thread Jean-Christophe Helary



On 26 déc. 07, at 17:08, Yury Tarasievich wrote:

On Wed, 26 Dec 2007 09:56:57 +0200, Alessandro Cattelan [EMAIL PROTECTED] 
 wrote:

...
translators, really need is a method to translate effectively using  
TM
and glossaries just like we do in the professional world. OmegaT  
would
have it all: a glossary extracted from SunGloss can easily be  
converted

for the tool and the OmegaT TM engine works very well... but then,
obviously, we need a TM that matches the content to be translated.


Maybe I'm missing something, but how can the Sun's glossary/TMX or  
whatever be helpful without meta-information? No amount of toolchain  
change is going to address this by itself.


I think you are indeed missing something.

As Ale wrote, such meta-information can be added to the glossaries (in  
OmegaT-use the third column) or to TMX files, or to XLIFF files.


TMX files can use the note place holder.
XLIFF files can use the context place holder.

Besides, glossary or TMX information in OmegaT (or anywhere else) is  
suggestions for the translator at best and the context can be provided  
by other means.


Other means include but is not limited to meta-information. Besides,  
it is necessary for the meta-information to be directly available and  
processable by the translator to have any practical use.


The focus on meta-information is valid as long as the data is  
automatically available to the processes. Currently it is not the  
case, or is it ?


Since there are not tools that can automatically process the SDF meta- 
information in its current form, focusing on meta-information seem to  
me to be counter productive.


Other ways to support the translator is to provide external context to  
strings. That can be done by the translator's experience itself  
(knowing the data set, having experience in the field etc), or by  
providing the data in external viewers: OOo's help viewer, screenshots  
etc...



Maybe *I'm* not making myself intelligible? I'm talking about having  
things assigned to the strings like a term variant, type of use  
(menu/option/...), keep short etc. Currently such info often has  
to be deduced from string ID, or lucky probe in the UI, even from  
sources digging.


Yes. That is correct. But in most of the cases the translator has  
enough common sense and external resources (the l10n community,  
experiences users, external context, etc.) to make do for the lack of  
meta-information or the lack of automatic access to it.


I fully understand that you want to provide the most error-prone-less  
possible workflow by using such meta-information, but in most cases  
this meta-information will not be available to the translators in a  
practical way. Last but not least, such meta-information is mostly  
useful for indetifying UI items, but for the whole rest of the  
translation process (terminology management, style management etc) it  
is simply useless.



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] PO and TMX contents do not match, help !

2007-12-26 Thread Jean-Christophe Helary



On 26 déc. 07, at 17:45, Yury Tarasievich wrote:

Could being the operative word here. See, I don't understand where  
do you expect this info to actually come *from*. Somebody has to  
type in those thousands of meta-descriptors into the carrier file,  
after all.


Yuri, your original question was:

Maybe I'm missing something, but how can the Sun's glossary/TMX or  
whatever be helpful without meta-information? No amount of toolchain  
change is going to address this by itself.


The answer is simple.

In the case of SUN GLOSS, and for an OmegaT centered process, you can  
leave the meta-information that SUN provides in its data as comments  
in the glossary file that OmegaT uses. When I write you can I mean  
it is trivial and can be done in a Calc sheet for example.


In the case of TMX/XLIFF, it can be done by properly using the  
relevant tags in the respective files. And that can be done with a  
script in the language of your choice. But for that, there is a need  
to have the _will_ to have a direct filter for the SDF format first.


It might as easily be done with the extended SDF/FDS/whatever as  
with XLIFF, but resources ought to be dedicated beforehand. And so,  
in the case of hypothetical format switch resources ought to be  
dedicated twice. That's why I strongly doubt the format switch at  
this juncture would facilitate the filling of the meta-info slots.


As Javier put it, SDF is _not_ a localization format. That is what you  
seem to not understand in what I wrote.


We need a localization format (PO, XLIFF, key=value, anything) that  
matches the localization data SUN provides us with (TMX). This has  
nothing to with with developing or not developing the SDF format.



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] PO and TMX contents do not match, help !

2007-12-26 Thread Jean-Christophe Helary



On 26 déc. 07, at 17:51, Alessandro Cattelan wrote:


Jean-Christophe Helary ha scritto:

What are the practical benefits related to using Pootle ?


Basically, I see two main benefits:
- it let's you assign files to translators so that you know who's  
translating a given file;
- it provides some statistics so that you know at a glance how many  
files or words need to be translated.


Ok, so the problem is that the current PO files, as provided by SUN  
using the oo2po convertion do not match the TMX contents so you can't  
work properly with them, right ?


So, if we could have PO files that match the TMX contents, we could  
use Pootle to do the file management and a different tool to do the  
translation itself.


Is that correct ?


Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Thoughts on Localization

2007-12-24 Thread Jean-Christophe Helary



On 24 déc. 07, at 22:27, Javier SOLA wrote:


Two ideas for the discussion,

- SDF is not a localization format. Nobody localizes using SDF. It  
is just an intermediary format that has the available information,  
and which simplifies the steps of gathering the necessary  
information and putting it back in the source.


I do use SDF because its contents matches the TMX provided by SUN. And  
I am not the only one.



- Localization formats that we are using are PO and XLIFF.


I don't use (and don't advise to use) the current implementation of  
the oo2po tool which produces the PO files a lot of people are using,  
because the contents produced do not match the contents of the TMX  
provided by SUN. And I am not the only one.


The current PO files create a huge overhead for translators, who need  
to play with \ characters so that their work is properly validated.


This comes from PO over-escaping strings that are alsread escaped in  
SDF.


We are working on new SDF-XLIFF, XLIFF-SDF and XLIFFUPGRADE  
filters that we hope to finish soon. The filters will be integrated  
in the upcoming version 0.5 of the WordForge off-line localization  
editor.


If what you do is compatible with the TMX contents that is great.


Now if the final idea is to use and XLIFF workflow, the best would  
be to have SDF contain the original XML and _not_ escaped strings that  
have no meaning when used in processes that include XLIFF or TMX.


XLIFF and TMX have all the necessary functions to protect the XML of  
the translatable strings.



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Thoughts on Localization

2007-12-24 Thread Jean-Christophe Helary



On 24 déc. 07, at 21:34, Yury Tarasievich wrote:

On Mon, 24 Dec 2007 11:29:59 +0200, Jean-Christophe Helary [EMAIL PROTECTED] 
 wrote:



Why has SUN moved from a workflow that ensures the most efficient use
of previous translations to a workflow that does not ?


Because it's not their priority?
Because at the time this seemed (or actually was) the best available  
solution?
Because the world-wide populariry of the OOO translating caught  
them unaware?


Because the communities were unaware of the existence of the tools and  
were drawn by a PO centered workflow that is mainstream in other  
FOSS communities. Nothing else.



How hard would it be to have a few Java programmers improve the
current OLT filters so that SDF is supported there ?


The OLT itself seems to be sort of put on ice, as it seems. Or so  
I gathered in Spring, when accessing the possibity to XLIFF-migrate  
the OOO translation I'm taking care of.


OLT the editor does not need to be modified. Only the small utility  
that is the OLT Filters needs to add SDF support.


And that would only to provide yet another way to translate, with a  
professional tool.



How hard would it be to give translators access to the full source of
the help files for context ?

Ie, what can be done in practical terms, besides for PO hacks, to
improve the translators' work and the output quality ?


I.e., to add meta-information, be it Nth extra field of SDF or  
whatever carrier format, which would be an enterprise all by itself.  
The PO hacks, ugly as they are, work, and translations are coming  
in. The new way to go, pretty as it may seem in theory, has yet to  
be implemented *and* to prove itself. What about *other* translating  
teams, after all?


It is not pretty in theory, it already works.

I've documented how to use SDF directly to get direct matches from the  
TMX in tools that are developped for translation work.


Here is the link in case you are interested:

http://mac4translators.blogspot.com/2007/12/openofficeorg-localization-and-easy-way.html

oo2po has failed to produce files that match the TMX data that SUN is  
providing the community with and even though PO based translations  
keep coming, using the current PO process _does not contribute to make  
the translation workflow easier_. But obviously, for communities that  
only know the current PO hacks such an assertion does not mean  
anything...



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Thoughts on Localization

2007-12-24 Thread Jean-Christophe Helary



On 24 déc. 07, at 23:13, Yury Tarasievich wrote:

Propose and implement what you wish, you still omit people in  
translating communities needing to re-learn etc.


Yes, re-learn to be more efficient. Anything wrong with that ?

See what is wrong with the current process and try to improve it.

As I wrote before, using the TMX in OmegaT with the SDF files directly  
allowed me to have about 50% of the GUI strings almost automatically  
translated, 25% had very close matches and the remaining 25% were new  
strings.


So I am saying that by spending 15mn to read what I wrote, another  
15mn to read the software tutorial, one can save 75% on the time spent  
to translate.


Take me for example — I get quite a fair re-use ratio with Kbabel, I  
don't feel comfortable with the feel of any free XLIFF-capable  
tools, and I have yet to see some good demonstration of translation- 
workflow-related advantages of the new way to go. What I saw in my  
experience with OmegaT wasn't better, it was better some, worse  
some.


It was not better because until now there was not way to easily use  
the contents of the SDF file. It was not easier to me neither even  
though I use OmegaT professionally. It was not easier because the  
files that are served to us are not properly converted and don't match  
the TMX contents.



Can't recall — did I say I'm opting out of this discussion already?


But you never attempted to discuss in the first place. I am not being  
uselessly critical, I have proposed solutions that work and that  
allow translators and coordinators to be more efficient.


And I know that because I have tried them, and I have completed and  
delivered translations with them.


Have you even tried the methods before ranting ? Of course not,  
otherwise you would have seen that what I wrote _did_ make sense.



Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Thoughts on Localization

2007-12-24 Thread Jean-Christophe Helary



On 25 déc. 07, at 05:12, Ain Vagula wrote:


I agree with you

I VERY agree with you


What is the point VERY agreeing with a useless rant that does not  
propose anything but criticize people who try to propose new and  
working solutions ?


Charles - knowing something about translation process technical  
details isn't surely mandatory for NL project lead, but there's no  
need to talk about things you aren't familar with. Every your  
positive or negative opinion can influence people because of your  
position as project lead. Please think twice next time.
Citation: May this be read and used by as many localisers as  
possible!


Obviously, it was not the _contents_ of the article that prompted this  
discussion.


Fortunately I was away from computer this evening when this message  
hit my mailbox and also fortunately Pavel responded to this. More  
gently I could ever do.


Ok, and what is your take on the fact that the TMX contents and the  
SDF contents match but not the PO ? How do you think it is possible to  
improve that so that translators can make better use of previously  
translated strings ?


Because _that_ is what is being discussed...


Jean-Christophe Helary


http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Thoughts on Localization

2007-12-23 Thread Jean-Christophe Helary

(Boris or anybody else, I Cc to OLT's dev list because that is  
relevant here. How hard do you think it would be to have OLT's filters  
work with OOo's l10n format: .sdf ?)


On 23 déc. 07, at 18:00, Yury Tarasievich wrote:

On Sun, 23 Dec 2007 05:59:05 +0200, Jean-Christophe Helary [EMAIL PROTECTED] 
 wrote:
XLIFF files directly generated from the XML for the help files and  
from the rest for the UI.


Here I should point out that preciously little tools work currently  
with the 1.1 XLIFF (even Sun's OLT didn't process 1.1 in Spring 2007).


Anyway, such change, while disruptive to many, would also be at  
least useless, if with it won't come extended meta-information  
coverage (string context etc.)


So, possibly the real answer is extend coverage of meta- 
information, extending the SDF format appropriately.


Yuri, I don't want to sound rude but you have it all wrong. I don't  
care about the SDF format.


What I need and what translators need is a tool chain that does not  
_break_ the data as it does now with oo2po.


What translators need is tools that support the format they use with  
the least transformation possible.



Open Language Tools' problem is not that is does not support XLIFF  
1.1, its problem is that its filter does not support SDF...


Besides, if OLT does not support XLIFF 1.1 why not feed it with XLIFF  
1.0 ? Have you thought about that ?


All the source files are in plain XML, have you thought about the  
efficiency loss of converting an XML file to a non XML format in terms  
of checks ? Have you considered the efficiency gains of keeping a  
strictly XML based tool chain for the localization ?



Pavel says that SDF is so easy to transform, then, what about OOo l10n  
developers took some time to improve OLT's filters so that we can use  
them directly with SDF ?


_That_ would considerably improve the translation process. To a point  
that OLT could actually be used with the TMX by any community that  
wants to work with a professional grade tool...


Any taker ?


--
Jean-Christophe Helary
http://mac4translators.blogspot.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] _Easy_ way to translate the SDF files with the TMX memories ...

2007-12-12 Thread Jean-Christophe Helary

-more than half of the remaining segments had a very close equivalent  
in the TMX

-the rest was about 60~70 segments _out of 400_



When the translation coordinator receives all the translated files,  
they are merged in the original SDF file and put to the issues tracker.



Also, the current CVS version of OmegaT includes Hunspell. You can use  
OOo's dictionaries directly with it. You just need ant to build  
OmegaT.



I hope this post will contribute to ease the OOo localization  
process ! And I would like to thank Alex for the numerous test  
versions he produced before _I_ was satisfied with sdf2txt !


Don't hesitate to ask questions if you have any !

Jean-Christophe Helary

==
==sdf2txt.jar is a Java utility.
 http://alex73.zaval.org/snapshots/OpenOffice/sdf2txt.jar
==
==OmegaT is a Java Computer Aided Translation tool.
 (Version 1.7.3)
 http://sourceforge.net/project/showfiles.php?group_id=68187package_id=214253
 (Version 1.8, CVS, with Hunspell)
 cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/ 
omegat co -P omegat

 to build, enter the /omegat/ folder and type ant
 the dictionary setup is relatively straightforward.
==

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] SDF converted to text for translation

2007-11-29 Thread Jean-Christophe Helary

Alex Buloichik has created a small command line utility to export all  
the source text contents to a plain text file.


For example:
(from OpenOffice.org-SRC680_m234-POT.tar.gz)

==
accessibility	source\helper\accessiblestrings.src	0	string	 
RID_STR_ACC_NAME_BROWSEBUTTON13691	en-US	Browse2002-02-02  
02:02:02
avmedia	source\framework\mediacontrol.src	0	string	AVMEDIA_STR_OPEN 
13691	en-US	Open2002-02-02 02:02:02
avmedia	source\framework\mediacontrol.src	0	string	 
AVMEDIA_STR_INSERT13691	en-US	Apply2002-02-02 02:02:02
avmedia	source\framework\mediacontrol.src	0	string	AVMEDIA_STR_PLAY 
13691	en-US	Play2002-02-02 02:02:02
avmedia	source\framework\mediacontrol.src	0	string	 
AVMEDIA_STR_PAUSE13691	en-US	Pause2002-02-02 02:02:02

==

Is exported to:

==
Browse

Ouvrir

Apply

Play

Pause
==

It takes less than a second to export the full 70.000 strings.

The text file can now be translated in any tool and not specifically  
PO editors.


Since the source is a text file equivalent to the contents of the sdf  
file, the TMX that Rafaella will match their contents much better than  
with a PO file.


The ideal workflow would be the following:

1) export the translatable contents of the 2.4 strings sdf files
2) use the resulting file as source in OmegaT
3) use the TMX as reference TMX files
4) use SUN GLOSS as reference glossary
5) translate all that in OmegaT to benefit from the automatic TMX  
matching functions

6) import the translated contents into the original
7) deliver a valid SDF file.

Anybody willing to test that before the translation really starts ?

Jean-Christophe Helary

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] TMX files (List of UI strings/translation)

2007-11-28 Thread Jean-Christophe Helary



On 28 nov. 07, at 19:55, Rafaella Braconi wrote:

Can anyone generate csv file from these tmx ?
I need the list of User Interface translation to use as the glossary
for OmegaT.



why don't you use the glossary available in Sun Gloss?


Is there a real equivalence between the UI files and Sun Gloss ? I  
think the point is to get only the UI strings, not the rest of the  
glossary.



Jean-Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] TMX files (List of UI strings/translation)

2007-11-28 Thread Jean-Christophe Helary



On 28 nov. 07, at 19:21, Reiko Saito wrote:


Hi JC,

I understood your point.
To increase the leverage, we may be able to lower the lowest match- 
rate,

or will use this tmx just for reference and search a certain string
as a file... what do you think ?


It is possible to use the sdf directly and not the PO file as source.
For that it is necessary to do a few things to the sdf file as I  
described sometimes in the summer if I remember well. Simply put, it  
amounts to that:


Basically the sdf comes as pairs of lines:
string in English+meta data
string in target language+meta data

The format for each line is very close to CSV so the idea is to:

1) convert each pairs to:
string in English+meta data string in target language+meta data
2) import that into OOo and to select the columns that correspond to  
string in English.
3) put that into a text file for use as source in OmegaT or anything  
else with the TMX provided by Rafaella as reference.


All this is done with a text editor and a few regex search/replace.

Once the translation is completed, it is pasted into the string in  
target part of the CSV file, the file is converted back to a 2 lines  
format and the result is delivered.



Can anyone generate csv file from these tmx ?


It is very easy but now that I am seeing the contents, I wonder it is  
is a good idea. Look at one tuv:


tu tuid=47197
tuv xml:lang=en-US
segThe query already exists. Do you want to delete 
it?/seg
/tuv
tuv xml:lang=ja-JP
segこのクエリはすでに存在します。削除します 
か。/seg

/tuv
/tu


Obviously this is a full sentence and not a menu item. So maybe  
Rafaella's idea of using Sun Gloss for glossary reference is better  
after all ?


Anyway, just in case we need the conversion it is not a very complex  
task to do that by hand with a few regex. I'll do it for the Japanese  
group if you want.


Jean-Christophe
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] List of UI strings/translation

2007-11-27 Thread Jean-Christophe Helary



On 27 nov. 07, at 22:49, Rafaella Braconi wrote:


Hi Reiko,

Reiko Saito ha scritto:

Hi Rafaella,

Exported stgings include English and translation, right ?


yes, that's correct.


Rafaella,

Would it be possible to have them exported to whatever format you like  
before we get the files to translate ?


So that we can convert them to the required formats ?

I know that the Japanese project will make use of them, and  
personally, as a participant to the French translation, I'd love to  
have them there too.


Is there a way to get that automatically or do we have to go through  
you ?


Jean-Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] TMX files (List of UI strings/translation)

2007-11-27 Thread Jean-Christophe Helary



On 28 nov. 07, at 02:19, Rafaella Braconi wrote:


So far I have created tmx files and made them available 
at:http://wiki.services.openoffice.org/wiki/Translation_for_2.4#Translation_Memories

But if for your work you would like to get sdf (at least as long as  
French is not available in Pootle) just let me know.


Very good ! Thank you Rafaella.

For other list members' information, the TMX files include the  
following languages:


 TMX_2007-09-12_de.zip12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_es.zip12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_fr.zip12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_hu.zip12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_it.zip12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_ja.zip12-Sep-2007 13:20  3.3M
 TMX_2007-09-12_ko.zip12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_nl.zip12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_pl.zip12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_pt-BR.zip 12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_pt.zip12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_ru.zip12-Sep-2007 13:20  3.2M
 TMX_2007-09-12_sv.zip12-Sep-2007 13:20  3.0M
 TMX_2007-09-12_zh-CN.zip 12-Sep-2007 13:20  3.1M
 TMX_2007-09-12_zh-TW.zip 12-Sep-2007 13:20  3.1M

I've just checked the fr and ja packages and they include about 44,000  
translation units for the Help and about 25,000 TUs for the UI, in  
our case such units are basically Help files paragraphs or UI items.


They seem to be taken not directly from the XML files that constitute  
the Help files but from the post-processed sdf files.


All this means:

1) The TMXs contain all the XML code escaped with \ as per the sdf  
file: they are not proper TMX level2 files
2) Since they conform to the sdf contents they can be used directly to  
translate it (either in OpenLanguageTools or OmegaT)
3) _But_ since the original XML code is also contained in the  
translation unit itself (instead of being encapsulated in TMX tags)  
there are chances that the matches will be influenced by the XML code  
instead of reflecting the translatable contents. Not only will that  
lower the frequency of relevant matches but that will add to the  
burden of the translator since that requires editting the escaped XML  
code to get a proper match (which would be automatic with proper  
encapsulation).
4) people who use sentence segmentation in their tools should disable  
it and work with paragraph segmentation on so as to get the best  
possible matches from the TMXs.


Ideally, it would be preferable to translate directly from the XML  
(not from sdf) and to have the XML code properly encapsulated within  
the TMX to provide translators with the best matches possible and the  
easiest way to recycle the XML code. If XLIFF is considered as a  
prefered format in the future (to replace sdf), I think that would be  
important to take into account proper encapsulation of the original XML.


I don't have much time right now, but if people are interested I could  
make a demonstration to show how much easier it would be for  
translators to have a proper localization format with proper TMX  
files.



Anyway, only the fact that real TMX files are available is a big plus  
compared to the times when none were available. Thank you very much  
Rafaella for your efforts. I hope we will be able to make good use of  
the data, as well as to propose better workflows in the future.


Jean-Christophe Helary

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] List of UI strings/translation

2007-11-26 Thread Jean-Christophe Helary



On 26 nov. 07, at 23:41, Rafaella Braconi wrote:


Hi Jean-Chrisophe,




an .sdf file can be opened and saved as CVS document...


Good, as Reiko just wrote, such contents would be interesting only it  
it includes the English and the translation.


Can you confirm it is the case ?


Rafaella
P.S. Nice to see you back on the list!


I thought my hibernation would last longer but global warming seems to  
affect Japan more than what I expected ;)


Jean-Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] List of UI strings/translation

2007-11-25 Thread Jean-Christophe Helary



On 26 nov. 07, at 15:17, Reiko Saito wrote:

Can the translator get the list of existing UI messages with  
translation ?

The list formatted as csv, like

---
Expand Tree: ツリーを展開
Contact Tree: ツリーを収縮


If those files are available, any Translation Memory Editor, e.g.  
OmegaT, can read the file and present them as the glossary on the  
tool.


Rafaella,

I think you mentioned a while ago (for 2.3 ?) that it was not possible  
to have real TMX of already existing translations. Is it still the  
case ?


When the contents are totally new, they are not as useful but when  
there are incremental additions to old contents such reference files  
can be extremely useful and greatly easy the translation process.


TMX is not required. As Reiko says, anything like CSV/TSV could be  
later converted to the required formats.


Jean-Christophe




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: [qa-dev] switching to XLIFF

2007-08-12 Thread Jean-Christophe Helary


On Aug 12, 2007, at 12:00 PM, Javier SOLA wrote:

OpenOffice has multiple variable formats, and it is nice that the  
program recognises them as units of text that need to be replicated  
exactly at the target. XLIFF uses for this the mrk in-line tag.  
The introduction of the tags must me done by the filters SDF to XLIFF.


Do you mean that the SDF-XLIFF filter will correctly encapsulate the  
XML code that SDF escapes ? If yes that is great news!



In case that is what you intend to do, you should be aware that mrk  
is _not_ the tag to do that:


http://www.oasis-open.org/committees/xliff/documents/xliff- 
specification.htm#mrk


quote
Marker - The mrk element delimits a section of text that has  
special meaning, such as a terminological unit, a proper name, an  
item that should not be modified, etc. It can be used for various  
processing tasks. For example, to indicate to a Machine Translation  
tool proper names that should not be translated; for terminology  
verification, to mark suspect expressions after a grammar checking.  
The mrk element is usually not generated by the extraction tool  
and it is not part of the tags used to merge the XLIFF file back  
into its original format.

/quote

mrk has _nothing_ to do with encapsulation of non translatable  
_code_, and as is indicated in the quote, it is _not_ generated by  
the extraction tool (or the filter) etc...



If you want to encapsulate the SDF code properly you need to use:

bpt and ept for code pairs, it for isolated code and sub for  
translatable subflows within the code.


http://www.oasis-open.org/committees/xliff/documents/xliff- 
specification.htm#bpt



If your segmentation process (if you have any) puts bpt/ept  
series in different source segments then it is sometimes considered  
safer to use ph (place holders) series instead.


http://www.oasis-open.org/committees/xliff/documents/xliff- 
specification.htm#ph


But mrk is certainly not the tag to use for non translatable code.



Translation memory must learn to deal with tags.


Translation memories have not waited for the translate-toolkit to  
deal with tags. Most translation memory tools already deal with TMX  
level 2 to various levels.


So maybe you mean translate-toolkit must learn to deal with tags ?

Independently of which tools are being used, I am glad to hear  
agreement on the fact that the future of OOo localization is XLIFF.


And it would be even better if the XLIFF could be provided directly  
from the original XML and not after a conversion to SDF, which  
renders the whole process uselessly complex.


Jean-Christophe Helary

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: [qa-dev] switching to XLIFF

2007-08-11 Thread Jean-Christophe Helary



On Aug 11, 2007, at 8:02 PM, Clytie Siddall wrote:

So I'm not talking about converting between SDF, PO and XLIFF, or  
between any combination of the three. I'm talking about using XLIFF  
as the base translation format for OpenOffice.org.


This is what I suggested in a mail here at the beginning of July. But  
currently all the XLIFF conversion have to go through the SDF-PO  
thing first.


The original help files are in XML so converting them directly to  
XLIFF should be way easier (for the translator but also for the  
processors as well) than going through the current SDF-PO thing.


People who still want to do PO will be able to do so with the  
conversions from XLIFF.


Jean-Christophe


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Imagine :)

2007-07-12 Thread Jean-Christophe Helary



On 12 juil. 07, at 20:29, Jean-Christophe Helary wrote:



On 12 juil. 07, at 17:36, Rafaella Braconi wrote:

However, from what I understand here, the issue you see is not  
necessarily Pootle but the format Pootle delivers which is .po. As  
already said, Pootle will be able to deliver in near future the  
content in xliff format. Would you still see a probelm with this?


Yes, because the problem is not the delivery format, it is the fact  
that you have 2 conversions from the HTML to the final format and  
the conversion processes are not clean. Similarly, the TMX you  
produce are not real TMX (at least not the one you sent me).


I am not arguing that UI files would benefit from such treatment. I  
am really focusing on the HTML documentation.


To make things even clearer, I am saying that using _any_  
intermediary format for documentation is a waste of resources.


If translators want to use intermediary formats to translate HTML in  
their favorite tool (be it PO, XLIFF or anything else) that is their  
business.


Janice (NetBeans) confirmed me that NB was considering a Pootle  
server exclusively for UI files (currently Java properties files),  
but in the end that would mean overhead anyway since the current  
process takes the Java properties as they are for translation in OmegaT.


In NB, the HTML documentation is available in packages corresponding  
to the modules, and the TMX (a real one...) allows to automatically  
get only the updated segments. No need for a complex infrastructure  
to produce differentials of the files, all this is managed by the  
translation tool automatically and _that_ allows the translator to  
have _much more_ leverage from the context and to benefit from a much  
greater choice of correspondances.


I suppose the overhead caused by the addition of an intermediary  
format for the UI files will be balanced by the management functions  
offered by the new system, but I wish we did not have to go through  
translating yet another intermediate format for the simple reason  
that seeing the existing conversion processes (I've tried only the  
translate-toolkit stuff and it was flawed enough to convince me _not_  
to use its output) is likely to break the existing TMX. If the  
management system were evolved enough to output the same Java  
properties files I am sure everybody would be happy. But, please, no  
more conversion than necessary.


To go back to the OOo processes, I have no doubt that a powerful  
management system available to the community is required. But in the  
end, why is there a need to produce .sdf files ? Why can't we simply  
have HTML sets, like the NB project, that we'd translate with  
appropriately formed TMX files in appropriate tools ?


My understanding from when I worked with Sun Translation Editor (when  
we were delivered .xlz files and before STE was released as OLT) is  
that we had to use XLIFF _because_ the .sdf format was obscure. But  
in the end, the discussion we are having now after many years of  
running in circles apparently) revolves not on how to ease the  
translator's work but on how to ease the management.


If the purpose of all this is to increase the translators' output  
quality, then it would be _much_ better to consider a similar system  
that uses the HTML sets directly. Because _that_ would allow the  
translator to spend much more time on checking the translation in  
commonly available tools (a web browser...) How do you do checks on  
PO/XLIFF/SDF without resorting to hacks ?


Keeping things simple _is_ the way to go.

Jean-Christophe Helary (fr team)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] TMX/XLIFF output (Re: [l10n-dev] Imagine :))

2007-07-12 Thread Jean-Christophe Helary



On 13 juil. 07, at 04:45, Rafaella Braconi wrote:



No, but that means that correct TMX files are a possibility (even   
now). By the way I wonder why Rafaella told me creating TMXs of  
the  state of the strings before the current updates was impossible ?


to clarify: the only possibility I have is to provide you TMX files  
in which translation exactly matches the English text now. If the  
English source has been changed I have following situation:


New English text - Old translation (matching previous text).
In the database I have no possibility to provide you with files  
containing Old English text and Updated English text.


Don't you have a snapshot of the doc _before_ it is modified ?

I mean, I have the 2.2.1 help files on my machine, so I can use the  
XML files in, for ex, sbasic.jar in the EN folder and align them with  
the same files in the FR folder and create a valid TMX of the state  
of the 2.2.1 version.


This is what I suggest you keep somewhere, for each language pair  
(with EN in source).


So you have a static set of TMX, archived by module (sbasic, swriter,  
etc) for each language, available from the community web, and  
translators just get the TMX they need for their current assignment.


Such files don't need to be dynamically generate,d they are valid for  
the most recent stable release, once the release is updated the files  
can be output for the translation of the next version.


So, create the TMX _before_ you modify the data base, _or_ from  
static files that exist anyway inside any copy of OOo. And create TMX  
level2 files, with all the original XML encapsulated so as not to  
confuse CAT tools and translators.




Regarding the output of proper source files, now that we (I...)  
know that the original is in XML, it should be trivial to provide  
them either directly as XML sets (specifically _without_ outputting  
diffs), or as XML diffs, or as XLIFFs.


You may have some technical requirements that have you produce SDF  
files, but those only add an extra layer of complexity to the  
translation process and I am sure you could have a clean XML output  
that includes all the SDF contained meta info, so that the source  
file _is_ some kind of XML and not an hybrid that considers XML as  
text (which is the major source of confusion).


If you have an XML workflow from the beginning, it should be much  
safer to keep it XML all the way hence:


original = XML (the OOo dialect)
diffs = XML (currently SDF, so shift to a dialect that uses the SDF  
info as attributes in XML diffs tags for ex)

source = XML (XLIFF)
reference = XML (TMX, taken from the original)


TMX is not supported by most PO editors anyway, so a clean TMX would  
mostly benefit people who use appropriate translation tools (free  
ones included).


Regarding the XLIFF (or PO, depending on the communities I gather)  
source output, each community (and even each contributor) could use  
the output that fits the tools in use.


XLIFF should be 1.0 so as to ensure OLT can be used (OLT does not  
support more recent versions of XLIFF sadly).


And then you have a clean workflow that satisfies everybody, and the  
management (Pootle) system can be put on all that to provide  
communities with the best environment possible.


And of course, this workflow is also valid for UI strings, since I  
suppose they can also be converted to XML (if they are not already).


What about that ?

Jean-Christophe Helary (fr team)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT UTF-8 problem

2007-07-12 Thread Jean-Christophe Helary



On 13 juil. 07, at 10:42, ChengLin wrote:


HI,

We're trying to use OmegaT in Simplified Chinese Windows XP, it  
can't save to UTF-8 but Chinese GBK.

Could anyone help us?


You go to Options/File Filter, select the file format you are using  
and edit the output encoding.


Jean-Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Translating .sdf files directly with OmegaT

2007-07-12 Thread Jean-Christophe Helary



On 13 juil. 07, at 02:40, Alessandro Cattelan wrote:


Actually I haven't tried going through the procedure you described, I
think I'll give it a try with the next batch of files. We'll have  
around
4,200 words to translate and as it is a reasonable volume, I think  
I'll

have some time to spend in testing a new procedure.

What I fear, though, is that OmegaT would become extremely slow
processing a huge SDF file. If I have a bunch of PO files I can just
import only a few of them into the OmT project at a time and that  
makes
it possible to translate without too much CPU sweat :o). When I  
tried

loading the whole OLH project on which we worked in June, my computer
was almost collapsing: it took me over an hour just to load the  
project!

  I don't have a powerful machine (AMD Athlon XP, 1500Mhz, 700MB RAM)
but I think that if you have a big TM it is not wise to load a project
with over a thousand segments.


You are definitely right here: the bigger the TMX the more memory it  
takes.


Which is the reason why I just suggested (in the Imagine thread)  
that we have TMX by modules.


Also, you can assign OmegaT more memory that you actually have on  
your machine, I use OmegaT like this:


java -server -Xmx2048M -jar OmegaT.jar 

The -server option makes it faster too.

The sdf files we have are not that big though. So you have to be  
selective with the TMX you use.



Maybe we could split the SDF file into smaller ones, but I'm not sure
that would work.


If you try my method, you can translate bits by bits. There are no  
problems with that. What matters is that the reverse conversion is  
properly made.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] For the communities that want to try OmegaT...

2007-07-11 Thread Jean-Christophe Helary


The updated online manuals are indexed here:
http://sourceforge.net/docman/display_doc.php?docid=61937group_id=68187

They are included in the test version available here:
http://sourceforge.net/project/showfiles.php?group_id=68187
(use the Other - Development | OmegaT 1.7.1 download)

And don't forget to read the explanations I sent on the 7th.

Besides for that, it is possible to use OmegaT without hacking  
anything if you just have plain ODF or HTML source file sets for  
your OOo related documentation. No need to convert anything to PO.  
Just follow the quick tutorial that displays at launch and you'll be  
translating in 15 mn. With a tool that professional translators use  
everyday...


Jean-Christophe Helary (fr)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Translating .sdf files directly with OmegaT

2007-07-11 Thread Jean-Christophe Helary



On 11 juil. 07, at 15:29, Arthur Buijs wrote:


The overhead of using po-files in the translation process is minimal
(exept from the initial trying out).


It is not when you have to modify the tagged links to fit the source.  
In OmegaT that is done automatically without you even noticing it.


Also, all the emph tags, if they need to be displaced or edited,  
require more work in a text based editor than in OmegaT (if done the  
way I suggested).


Of course, using the PO files in a PO editor or in OmegaT will not  
make much difference in terms of editing the matches. The problem  
_is_ which source file you choose to work with and what relation they  
have to the original format (here: HTML-SDF-PO, almost no relation  
anymore when you reach the PO stage.)


So I am really talking about not using PO because _that_ requires to  
handle the files at text, while using the modified .sdf allows them  
to be handled as HTML (which does considerably reduce the amount of  
editing).



Ofcourse this is only true if a
useable tmx-file is available. My advise would be to find a better way
to generate tmx-files and use po-files for the translation-task.


The TMXs provided by Rafaella were similar to the ones provided by  
the translate-toolkit processes (oo2po - po2tmx) and neither  
corresponded to the source po file in terms of number of \  
characters for the escape sequences. They corresponded to the  
original .sdf file, which is what originally prompted me to use the  
original .sdf file as source. The rest of the hack I proposed on the  
7/7 comes from that.



The general problem does not only come from the TMX, but from the  
fact that .sdf is already an intermediate format (that you then  
convert to yet another intermediate format - po).


The original conversion requires escapes and _that_ is what requires  
the files to be handled as text when they could just as well be  
handled as pure and simple HTML which most translation tools support.


The TMX problem is yet another problem.

Here, we have the following structure for the TMXs:

(new source segment)
(old target translation, if present)

A _real_ TMX should be:

(old source segment)
(old target translation)

So the current process is very confusing and does not allow TMX  
supporting tools (like OmegaT or even OLT) to fully leverage the  
contents of the source. Which is the real function of the TMX file.


Plus, the fact that the TMX do not reflect the structure of the  
actual source file (PO) makes them yet another problem.



Of course, I am commenting on the process only with the perspective  
of allowing translation contributors to have access to a translation  
workflow that supports the use of computer aided translation tools.  
Right now the process that is suggested by the file formats available  
for OOo's localization does not facilitate this at all.


Another of SUN's project, namely NetBeans, manages to fully leverage  
legacy translations thanks to the use of simple source file formats  
(the UI files are simple Java properties and the Help files are  
simply HTML) and the whole source files are matched to the legacy  
translations output to TMX for super easy translation (in OmegaT or  
any other TMX supporting tool, even though OmegaT is the most used  
tool there).


As long as OOo sticks to intermediate file formats (.sdf/.po/.xliff)  
with the current unstable conversion processes, hack will be  
necessary to reach the same level of efficiency other communities  
have already reached. And _that_ is really too bad.



Cheers,

Jean-Christophe Helary (fr)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] Imagine :)

2007-07-11 Thread Jean-Christophe Helary

I have no idea where the UI files come from and how they _must_ be  
processed before reaching the state of l10n source files.


So, let me give a very simplified view of the Help files preparation  
for l10n as seen from a pure TMX+TMX supporting tool point of view.  
Since I don't know what the internal processes really are I can only  
guess and I may be mistaken.


• The original Help files are English HTML file sets.
• Each localization has a set of files that corresponds to the  
English HTML sets

• The English and localized versions are sync'ed

To create TMX files:

Use a process that aligns each block level tag in the English set to  
the corresponding block level tag in the localized set. That is  
called paragraph (or block) segmentation and that what SUN does for  
NetBeans: no intermediary file format, no .sdf, no .po, no whatever  
between the Help sets and the TMX sets.


The newly updated English Help files come as sets of files, all HTML.

The process to translate, after the original TMX conversion above  
(only _ONE_ conversion in the whole process) is the following:


Load the source file sets and the TMX sets in the tool.

The HTML tags are automatically handled by the tool.
The already translated segments are automatically translated by the  
tool.
The translator only needs to focus on what has been updated. Using  
the whole translation memory as reference.


Once the translation is done, the translator delivers the full set  
that is integrated in the release after proofreading etc.


What is required from the source files provided side ? Creating TMX  
from HTML paragraph sets.


What is required from the translator ? No conversion whatsoever, just  
work with the files and automatically update the translation with the  
legacy data.




Now, what do we have currently ?

The source files provider creates a differential of the new vs the  
old HTML set.

It converts the result to an intermediate format (.sdf)
It converts that result to yet another intermediate format for the  
translator (either .po or xliff)
It matches the results of the diff strings to corresponding old  
localized strings, thus removing the real context of the old string
It creates a false TMX based on an already intermediate format,  
without hiding the internal codes (no TMX level 2, all the tag info  
is handled as text data...)


The translator is left to use intermediate files that have been  
converted twice, removing most relation to the original format and  
adding the probability of having problems with the back conversion.


It has to work with a false TMX that has none of the original  
context, thus producing false matches that need to be guessed  
backward and that displays internal codes as text data.



Do you see where the overhead is ?



It is very possible that the UI files do require some sort of  
intermediate conversion to provide the translators with a manageable  
set of files, but as far as the Help files are concerned (and as far  
as I understand the process at hand) there is absolutely no need  
whatsoever to use an intermediate conversion, to remove the original  
context and to force the translator to use error prone source files.



It is important to find ways to simplify the system so that more  
people can contribute, so that the source files provider has less  
tasks to handle, but clearly using a .po based process to translate  
HTML files is going totally the opposite way. And translators are  
(sadly without being conscious of that) suffering from that, which  
results into less time spend on checking one's translation and a  
general overhead for checkers and converters.


Don't get me wrong, I am not ranting or anything, I _am_ really  
trying to convince people here that things could (and should) be  
drastically simplified, and for people who have some time, I  
encourage you to see how NetBeans manages its localization process.  
Because we are loosing a _huge_ amount of human resources in the  
current process.


Cheers,

Jean-Christophe Helary (fr team)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Translating .sdf files directly with OmegaT

2007-07-10 Thread Jean-Christophe Helary


Ale,

I was wondering if you eventually had considered this procedure. I  
works very correctly and considerably increases productivity thanks  
to OmegaT's HTML handling features. I think I'm going to investigate  
the possibility of having an .sdf filter for OmegaT rather than  
having to go through all the po loops that really don't provide much  
more than yet another intermediate format that is anyway inconvenient  
to translate.


JC

On 7 juil. 07, at 00:41, Jean-Christophe Helary wrote:

The reason why I tried to do that is because using the .po created  
with oo2po along with the TMX created with po2tmx does not work  
well. The po2tmx removes data from escape sequences and that means  
more things to type in the OmegaT edit window.


So, the idea was to consider the .sdf file as a pseudo HTML file to  
benefit from a few automatic goodies offered by OmegaT:
1) tag reduction (so that one needs to type less when tags are  
inline) and
2) tag protection (for block tags like the ahelp.../ahelp when  
they open and close the segment)


if the TMX could be hacked to show formatting tags similar to the  
modified source file it would become trivial to edit the tags and  
reflect the new contents found in source.


Problem is, an .sdf file is not a HTML file: there is plenty of  
meta information and a lot of escaped ,  and others.
Also, a .sdf file seems to be constituted of 2 lines blocks: the  
source line and the target line.


The first problem will be solved later, now, to extract the  
translatable contents we need to change the 2 lines blocks into one  
line blocks with source and target data next to each other.


This is does using a regexp like (those are not exact, I do them  
from memory plus they may change depending on the editor you chose):


search for:
^(.*)(en-US)(.*)\r^(.*)(fr)(.*)
replace with:
\1\2\3\t\4\5\6

Now that your .sdf is linearized, change its name to .csv and  
open it in OpenOffice by using tab as field separator and  
nothing as text delimiter.


The tabs in the original .sdf create a number of columns from where  
you just need to copy the column with the en-US translatable contents.


Paste that into a text file that you'll name to .html

Now, we need to convert this to pseudo HTML. The idea being that  
OmegaT will smoothly handle all the ahelp etc tags that will be  
found there.


First of all, we need to understand that not all the  are tag  
beginning characters, a number of them are simply inferior  
characters. So we grab those first:


search for:
([^\])
replace with:
\1lt;

 are less of a problem but let's do them anyway:

search for:
([^\])
replace with:
\1gt;

Now we can safely assume that all the remaining  or  are  
escaped with \ and to correct that (so that the non escaped tags  
can be recognized in OmegaT) do:


search for:
\\
replace with:


search for:
\\
replace with:


Last but not least, to ensure that OmegaT will consider each line  
as being a segment we need to add the paragraph mark to each line  
beginning:


search for:
^
replace with:
p

Save, the file should be ready to be processed.



Now, we need to get matches from the TMX files that either we have  
created (oo2po - po2tmx) or that Rafaella  all have provided us  
with.


Problem is that the TMX files reflect the contents of the .sdf that  
we have just modified.


In the TMX, we are likely to find an ahelp tag written as \ahelp  
something\ which will not be helpful since in OmegaT the ahelp  
tag will be displayed as a0 and thus will not match the \ahelp  
something\ string.


So, we need to hack the file so that it looks close enough to what  
the source expects...


In the TMX we want to reduce _all_ the escaped tags to a short  
expression that looks like a for a tag starting with a.


So we would do something like (here again, not 100% exact regexp).

search for:
\\(.)[^]*
replace with:
lt;\1gt;

same for tail tags:
\\/(.)[^]*
replace with:
lt;/\1gt;

If I remember well everything I did in the last few days that is  
about it. Save the TMX, put it in /tm/, load the project and  
translate...


You can also put the Sun glossaries in /glossary/ after a little  
bit of formatting. But that too is trivial.



When translation is done, it is important to verify the tags (Tool - 
 Valitate tags) click on each segment where the tags don't with  
source and correct the target.


Then Project - Create translated files

Get the translated .html file from /target/

And now we need to back process the whole thing to revert it to its  
original .sdf form.


1) remove all the p at the beginning of the lines
2) replace all the  with \, all the  with \, all the lt; with  
 and the gt; with 



This should be enough. Now copy the whole file and paste it in the  
target contents part of the still opened .csv file.


The .csv file now contains the source part and the target part next  
to each other.


Let's save this (be careful: tab as field separator and nothing  
as text delimiter).


Open

[l10n-dev] Translating .sdf files directly with OmegaT

2007-07-06 Thread Jean-Christophe Helary

The reason why I tried to do that is because using the .po created  
with oo2po along with the TMX created with po2tmx does not work well.  
The po2tmx removes data from escape sequences and that means more  
things to type in the OmegaT edit window.


So, the idea was to consider the .sdf file as a pseudo HTML file to  
benefit from a few automatic goodies offered by OmegaT:
1) tag reduction (so that one needs to type less when tags are  
inline) and
2) tag protection (for block tags like the ahelp.../ahelp when  
they open and close the segment)


if the TMX could be hacked to show formatting tags similar to the  
modified source file it would become trivial to edit the tags and  
reflect the new contents found in source.


Problem is, an .sdf file is not a HTML file: there is plenty of meta  
information and a lot of escaped ,  and others.
Also, a .sdf file seems to be constituted of 2 lines blocks: the  
source line and the target line.


The first problem will be solved later, now, to extract the  
translatable contents we need to change the 2 lines blocks into one  
line blocks with source and target data next to each other.


This is does using a regexp like (those are not exact, I do them from  
memory plus they may change depending on the editor you chose):


search for:
^(.*)(en-US)(.*)\r^(.*)(fr)(.*)
replace with:
\1\2\3\t\4\5\6

Now that your .sdf is linearized, change its name to .csv and open  
it in OpenOffice by using tab as field separator and nothing as  
text delimiter.


The tabs in the original .sdf create a number of columns from where  
you just need to copy the column with the en-US translatable contents.


Paste that into a text file that you'll name to .html

Now, we need to convert this to pseudo HTML. The idea being that  
OmegaT will smoothly handle all the ahelp etc tags that will be  
found there.


First of all, we need to understand that not all the  are tag  
beginning characters, a number of them are simply inferior  
characters. So we grab those first:


search for:
([^\])
replace with:
\1lt;

 are less of a problem but let's do them anyway:

search for:
([^\])
replace with:
\1gt;

Now we can safely assume that all the remaining  or  are  
escaped with \ and to correct that (so that the non escaped tags  
can be recognized in OmegaT) do:


search for:
\\
replace with:


search for:
\\
replace with:


Last but not least, to ensure that OmegaT will consider each line as  
being a segment we need to add the paragraph mark to each line  
beginning:


search for:
^
replace with:
p

Save, the file should be ready to be processed.



Now, we need to get matches from the TMX files that either we have  
created (oo2po - po2tmx) or that Rafaella  all have provided us with.


Problem is that the TMX files reflect the contents of the .sdf that  
we have just modified.


In the TMX, we are likely to find an ahelp tag written as \ahelp  
something\ which will not be helpful since in OmegaT the ahelp tag  
will be displayed as a0 and thus will not match the \ahelp  
something\ string.


So, we need to hack the file so that it looks close enough to what  
the source expects...


In the TMX we want to reduce _all_ the escaped tags to a short  
expression that looks like a for a tag starting with a.


So we would do something like (here again, not 100% exact regexp).

search for:
\\(.)[^]*
replace with:
lt;\1gt;

same for tail tags:
\\/(.)[^]*
replace with:
lt;/\1gt;

If I remember well everything I did in the last few days that is  
about it. Save the TMX, put it in /tm/, load the project and  
translate...


You can also put the Sun glossaries in /glossary/ after a little bit  
of formatting. But that too is trivial.



When translation is done, it is important to verify the tags (Tool -  
Valitate tags) click on each segment where the tags don't with source  
and correct the target.


Then Project - Create translated files

Get the translated .html file from /target/

And now we need to back process the whole thing to revert it to its  
original .sdf form.


1) remove all the p at the beginning of the lines
2) replace all the  with \, all the  with \, all the lt; with   
and the gt; with 



This should be enough. Now copy the whole file and paste it in the  
target contents part of the still opened .csv file.


The .csv file now contains the source part and the target part next  
to each other.


Let's save this (be careful: tab as field separator and nothing  
as text delimiter).


Open the result in the text editor.

The pattern we need to find to revert the 1 line blocks to 2 line  
blocks is something like:


(something)(followed by lots of en-US stuff)a tab(the same something) 
(followed by lots of translated stuff)


^([^\t])(.*)\t\1(.*)$
and we need to replace it with:
\1\2\r\1\4

Make sure there are no mistakes (if there are any they are likely to  
appear right in the first lines).


Now you should have your 2 lines block.

Rename the file to .sdf and here you are.



This

Re: [l10n-dev] Error with PO files form Pootle

2007-07-04 Thread Jean-Christophe Helary


Alessandro,

I have found a relatively painless way to directly translate the .sdf  
files in OmegaT. I have to finish my part now so I'll document that  
later.


JC

On 4 juil. 07, at 00:20, Alessandro Cattelan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
it seems that the PO files in the Pootle server used for the OOo 2.3
L10N are not properly formed. The comments of some strings have been
included in the msgid field. Can somebody explain how to handle these
strings?

Here's an example taken from the UUI folder of the OOo GUI:

#: masterpasscrtdlg.src#DLG_UUI_MASTERPASSWORD_CRT.modaldialog.text
msgid 
_: masterpasscrtdlg.src#DLG_UUI_MASTERPASSWORD_CRT.modaldialog.text 
\n

Enter Master Password
msgstr 

#:
masterpassworddlg.src#DLG_UUI_MASTERPASSWORD.FT_MASTERPASSWORD.fixedte 
xt.text

msgid 
_:
masterpassworddlg.src#DLG_UUI_MASTERPASSWORD.FT_MASTERPASSWORD.fixedte 
xt.

text\n
Master password
msgstr 


Thanks,
Ale.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Error with PO files form Pootle

2007-07-04 Thread Jean-Christophe Helary



On 4 juil. 07, at 17:39, Arthur Buijs - ArtIeTee wrote:


Alessandro Cattelan schreef:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Jean-Christophe Helary ha scritto:

Alessandro,

I have found a relatively painless way to directly translate  
the .sdf
files in OmegaT. I have to finish my part now so I'll document  
that later.

That sounds very interesting! I'll be waiting for it.


Indeed. I'll test your documentation as soon as it becomes  
available ;-)


Ok, I just give you an outline :) because I _am_ behind schedule...

2 main ideas: 1) the translatable contents is actually surrounded by  
tabs, 2) the escaped sequences are for HTML like code


from 1): opening the file in OOo after renaming it to .csv produces  
something very nice to the eye
from 2): removing the relevant \ produces strings that actually  
look _like_ HTML (all the \ are replaced by , while all the   
not preceded by \ are replaced by lt;)


and we need a 0): the .sdf is composed of groups of 2 lines, putting  
such a group on one line to have the .csv file look like 2 column  
sets (one for source one for target) is trivial.


now, you copy paste the column that contains the source contents to a  
text file, you add p at each beginning of line, you rename the  
thing to .html and you load it into OmegaT.


The TMX created with po2tmx must be treated so that the code inside  
the segments looks like the tags that will be produced from the  
source file (namely e0, a0 etc...) so just replace all the \emph 
\ etc by lt;egt;, that way you'll only have to add the numbers  
when the match is inserted.


OmegaT is smart enough to handle source segments that look like  
ahelp something very longblabla/ahelp and will only display  
blabla so that you are sure the source tags are protected during  
the translation etc...


That is very rough and I have not yet back converted the file (put  
the  and \ back where they belong), but when that is done, just  
paste the translated contents into your OOo Calc target contents  
column, save, put the groups back to 2 lines and deliver.


It _does_ look like an awful hack (I'd say it is borderline, on the  
easy side of the line :) but it is way better than having to handle  
the po in source with the half backed TMX that you get from the  
po2tmx conversion. At least, OmegaT protects pretty much all the tags  
and you don't need to add them to the target segment, OmegaT does  
that nicely for you.


Ok, back to my last 50 lines !!!

JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT and PO files

2007-06-19 Thread Jean-Christophe Helary



On 19 juin 07, at 15:50, Alessandro Cattelan wrote:


Now I have one more question to which I'm sure Jean-Cristophe has the
answer... ;o)

When opening a project with OmegaT I thought that the text in  
msgstr in

the PO file would show up in the target segment, but that is not the
case. I think that checking the PO file si quite useful because of the
comments and of the changed strings extracted from the Sun  
database, but
keeping OmegaT and a text editor with the PO file open side by side  
on a

15' monitor is not very comfortable...

Is there any way to make OmT show at least the msgstr content?


Yes. It is what I have been demonstrating here by creating a TMX from  
the .sdf file.


Basically OmegaT's PO handling is only (just like for monolingual  
files) to put in the target editing field the contents of source, for  
edition.


OmegaT is not able to know that a target already exist and to propose  
it for editing.


What I have thus done is the following:

1) convert the .sdf file to .pot (oo2po -P etc) to remove all the  
msgstr contents
2) create a TMX from the pseudo translated .sdf (oo2po and then  
po2tmx, cf my comments on that process in a different thread)
3) put the .pot in /source/, the tmx in /tm/ and OmegaT will  
automatically match the .pot strings to their pseudo translated  
counterparts in the tmx, thus allowing you to have the msgstr  
contents in target.


It is a little non-trivial, but remember that OmegaT is not made to  
work with bilingual localization files. It works with a monolingual  
file in source and bilingual TM files for reference.


I think Rafaella should be able to provide you with proper TMX files  
that match the .sdf contents.


Cheers,
JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT and PO files

2007-06-19 Thread Jean-Christophe Helary



On 19 juin 07, at 18:17, Rafaella Braconi wrote:


Hi Alessandro, Jean-Christophe,

Alessandro Cattelan wrote:


Jean-Christophe Helary ha scritto:


I think Rafaella should be able to provide you with proper TMX files
that match the .sdf contents.



Rafaella has already provided us with the OLH TMX some time ago.

yes, I was also suggesting jean-Christophe to use that tmx files  
but I think that he really want to make sure to get the most  
updated tmx files...


I am currently loading them and I don't see anything problematic.

The only problem I see with the method you propose is that we  
would end

up having two TM. The TM I have is pretty big (over 12MB) and OmegaT
takes a long time to analyse it. If I put another big TM in the tm
folder I think it would end up being too slow. However, I'll have  
a look

at that.


I am not clear about this. Why would you end up having 2 TMs?  
Cannot one sinply use the most recent tMX files?


Anyway, as far as OmegaT is concerned that does not matter. It is  
only the total number of segments to match that will lengthen he load  
process.


I have just loaded the 8000 segments + 2 x 2000 segments TMXs  
matching to the whole .sdf-pot file and it took less than 2mn on my  
machine (MacBook duo core 2ghz/2gb) I also assigned 2gb to OmegaT -in  
Java seerver mode (faster in general):


java -server -Xmx2048M -jar OmegaT.jar

JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] escaping

2007-06-19 Thread Jean-Christophe Helary


OmegaT handles PO files pretty much as text files and thus does not

care about \, for it, the \ is just another character. Hence,
there is nothing that is generated by OmegaT in the screenshot I
showed. The files are displayed as they are.


Friedel,

I am not arguing for or against a certain way to display the data I  
am just saying that OmegaT does not do anything to the data. And  
considers the PO escapes as a \ character.


Unfortunately a PO file isn't just a text file. It is a file format  
that

presents data in a specific way. To escape the slash
(\) and the quotes () is part of the format that we try to conform  
to.


Which is very good and OmegaT does not interfere with that.

big_snip


So, you see, the TMX does not exactly match the original .po file.
Although it does match the .sdf, but this is irrelevant.

When I created the TMX by using XLFEdit from Heartsome, I first too
the converted po, converted it to XLIFF and then exported it as TMX
and the TMX contained the same number of escapes as the po.


I would consider this behaviour by the Heartsome tool to be a bug,  
to be

honest. Do they convert '' to 'lt;' ? Then they should also convert
the rest. I would say this is part of the rules of data conversion
between these formats.

I believe our conversion conforms to the XLIFF representation guide  
for

PO files:
http://xliff-tools.freedesktop.org/snapshots/po-repr-guide/wd-xliff- 
profile-po.html#s.general_considerations.escapechars


I think it follows logically that the same rules should apply for
converting to TMX.


I have no idea who is right and who is wrong. What I can say is that  
Heartsome is _very_ strong when it comes to respecting standards.  
Besides, the document you quote has contributions from Rodolfo Raya  
who is also developer at Heartsome and who himself is extremely picky  
when it comes to standards compliance.


In 3.4.Handling of Escape Sequences in Software Messages, the text  
says, regarding a fragment that includes escape sequences like we  
have here: This fragment could be presented in XLIFF by preserving  
the escape sequences:


etc. Of course it proposes rules to handle special escape sequences  
as opposed to generic escape sequences but there is nothing wrong  
seemingly with keeping all the escape sequences.


What matters in the end is _not_ that the PO has been through an  
XLIFF conversion process or not.


What matter is that:

1) I have a source po with \\\this kind of things\\\
2) my reference TMX should match that with \\\that kind of things\\ 
\ because it is created from a similar po file

3) but for some reason it provides only \\this other kind of things\\

Let me repeat myself. I have no issue with your processes and with  
your level of compliance with the proposed standards.


The only problem is that somewhere, the TMX conversion process looses  
data and that impairs my ability to get leverage from it.


A somewhat separate issue for me is that the \ in the SDF file is  
also

an escape of that format. In reality it refers to just a left angular
bracket. The SDF format is however a bit strange in the way these are
used, and we might not want to change the way we handle the SDF  
escaping
while Pavel's POT files has a semi-official status. If we can agree  
how
we interpret the escaping in the SDF file and coordinate the  
change, we

can probably make the lives of translators far easier by eliminating
much of the escaping.


I don't think the problem is in the oo2po process. Whatever the  
result we are all starting from po anyway.


What is at stake here is that if I take a po created from .sdf and I  
use po2tmx on that same file, the data that the TMX contains is  
different from the data in the po.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT and PO files

2007-06-19 Thread Jean-Christophe Helary



On 19 juin 07, at 17:12, Alessandro Cattelan wrote:

I don't understand why you need to create .pot instead of .po files. I
converted the sdf to po files and OmegaT just ignores the msgstr
content, so what is the use of having a pot file with empty msgstr  
fields?


Because I am pretty sure OmegaT would not overwrite the msgstr part  
since it does not know about it. So this is likely to result in a  
buggy target file. But maybe I am just too careful.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT and PO files

2007-06-19 Thread Jean-Christophe Helary



On 19 juin 07, at 19:51, Rafaella Braconi wrote:
yes, I was also suggesting jean-Christophe to use that tmx files   
but I think that he really want to make sure to get the most   
updated tmx files...


I am currently loading them and I don't see anything problematic.


Are you referring to the tmx TEST file I just provided? Does this  
really work?


Yes, I am replying to you offlist with some specifics but basically  
they work well.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Contents of the OOo 2.3 .sdf, problems with TMX conversion

2007-06-18 Thread Jean-Christophe Helary


Rafaella,

Thank you very much for the comments.

I was a little confused because it seems that for each different  
community project SUN manages, there is a different way to localize :)


I would like to know if it is possible to provide us (or at least  
me...) with the .sdf strings _before_ the current modification so as  
to be able to create a correct TMX file.


If you could create that TMX yourself and make it available it would  
be even better. That TMX would contain the state of the corpus before  
the modifications (2.2.1) and would allow translators who work with  
TMX supporting tools (including Sun's own OLT, or OmegaT to name only  
the free ones) to work efficiently with the current files.


For you information, I decided to create a .pot file out of the .sdf  
so that I was sure there were no pseudo-translated strings in  
French and I created a pseudo-tmx with the original contents that I  
use to match every source string with.


This solution is better than hand editing the whole file but the  
whole thing would be even more efficient if instead of a pseudo-tmx I  
had the real thing based on the 2.2.1 contents.


Do you think it is possible to get that from SUN ?

Regards,
Jean-Christophe

On 18 juin 07, at 17:03, Rafaella Braconi wrote:


Hi Jean- Christophe,

in the QA session you may find the answer to your question already:
http://wiki.services.openoffice.org/wiki/Translation_for_2.3#Q_.26_A

Also, please see my comments inline:

Jean-Christophe Helary wrote:

I realized a few days ago that the .sdf (at least for the fr  
project)  for the coming 2.3 contains weird stuff without much of  
an  explaination as to how to differenciate the different parts.


1) in some places the target part is made of what would be a  
fuzzy  in PO, but without specific notification of the fuzzy  
character


what you see is the previous translation. This means that in the  
meanwhile the English text has been updated and since in most cases  
the old translation contains terminology which can be reused for  
updating the string, we decided to keep as a sort of *suggestion*  
the previous translation instead of overwriting it with the English  
text.



2) in some places it seemingly contains exact matches


sometimes the English text has been updated in such a way that this  
is not translation relevant. For example a typo in the English text  
has been corrected. Since the authors may not necessarily know if a  
change is translation relevant or not, they flag the English  
updated text has updated and it gets extracted as *changed* strings  
when we prepare the files to send to translation.



3) in some other places it contains the source English string


when the English text is completely new. This means that this is  
the first time the strings gets translated.




In the case where the fuzzy is present, the reference links are   
sometimes totally different. Which means that besides for the  
actual  editing of the translation, it is also necessary to edit  
the links.


Yes, in this case the translation needs to be updated including  
links, tags, variables etc




I wonder about the utility of such a mechanism especially since  
there  is no way to differenciate between the 3 patterns in  
the .sdf itself.


The utility is that in may cases the previous translation contain  
terminology that can be reused to update the text




It seems to me it would have been faster to _not_ insert fuzzies  
at  all and to provide a complete TMX of the existing OOo contents  
instead.


They are not fuzzies.



Right now, if one wants to create a TMX out of the .sdf files  
(either  with the Translate toolkit or with Hearstome translation  
suite, I  suppose there are other ways though), it is impossible  
to have the  source strings corresponding to the fuzzy target and  
thus the  matching in a TMX suppotirting CAT tool will not be of  
much use.


You cannot create TMX out of the sdf files provided because the  
translated strings contained in it are not final translations




Is there still a way to get SUN to provide the l10n teams with TMX  
of  the existing contents, similarly to what we can get through  
the  SunGloss system ?


We could provide you with an sdf files containing the final  
translations if that helps


Rafaella



(FYI, the NetBeans team is provided with TMX and that greatly   
enhances the localization process.)


Jean-Christophe Helary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT or poEdit?

2007-06-18 Thread Jean-Christophe Helary



On 18 juin 07, at 22:22, Alessandro Cattelan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
starting from today I'll have some more free time to dedicate to OOo
L10N so I'd like to start working on it. I'm wondering whether the
Italian team should use OmegaT or poEdit to translate the OLH and
possibly the GUI (using Pootle as a translation workflow manager).

Petr, Rafaella, can I go ahead and use OmegaT?


Ale,

I noticed that the TMX I created with translate-toolkit from the  
pseudo-translated .sdf are not useable because for some reason the  
po2tmx script systematically removed one escape \ character from  
the original po file.


I had to use a non-free tool to create the TMX, but if Rafaella and  
SUN can provide the teams with a TMX of the 2.2.1 strings then I  
personally think that OmegaT (because of the automatic matching) is  
the tool of choice for the people who are used to it. Besides for the  
fact that you can leverage your old TMX with it too.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT or poEdit?

2007-06-18 Thread Jean-Christophe Helary


Hi,
starting from today I'll have some more free time to dedicate to OOo
L10N so I'd like to start working on it. I'm wondering whether the
Italian team should use OmegaT or poEdit to translate the OLH and
possibly the GUI (using Pootle as a translation workflow manager).

Petr, Rafaella, can I go ahead and use OmegaT?


Ale,

I noticed that the TMX I created with translate-toolkit from the
pseudo-translated .sdf are not useable because for some reason the
po2tmx script systematically removed one escape \ character from
the original po file.


Hi Jean-Christophe

Please elaborate on the problem so that we can find out where the  
error
comes in and fix it if necessary. You can reply here, in private  
mail or

the translate-toolkit mailing list - as you prefer.


Friedel,

Thank you very much.

To put it simply, I did:

oo2po and then po2tm from the .sdf file that compose the current job.

At first I did not notice anything but after a few segments, I found  
what I was lucky to capture in the screenshot I linked to the other day:


http://www.eskimo.com/~helary/_files/session_omegat.png

the green background segment is the oo2po file pretty much without  
modifications (notice the fact that all the \ are doubled, the \\  
even come as \\\)


the lower part shows you the po2tmx segment matching the current  
source: the contents should be identical but you'll see that there  
are systematically \ missing.


I re-created the tmx with Heartsome's XLFEdit and got a file matching  
the source segment properly.


The original .sdf does not contain the extra \ though, so I suppose  
they are put there in the oo2po process, which is fine, as long as  
they stay there all the way :)


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT or poEdit?

2007-06-18 Thread Jean-Christophe Helary



On 18 juin 07, at 22:51, Rafaella Braconi wrote:


Hi Friedel,

it would be really great to hat that issue fixed. In that case we  
would be able to provide sdf files containing final translations  
(and not pseudo ones) which can be used to create tmx files.


It is also possible to use Rainbow (from the Okapi framework -  
LGPL .NET 2.0) to get the proper TMX from this process. Just use  
oo2po to get a po file and Rainbow to convert that to TMX.


JC


Please let us know about the outcome.

Rafaella


F Wolff wrote:


On Ma, 2007-06-18 at 22:40 +0900, Jean-Christophe Helary wrote:


On 18 juin 07, at 22:22, Alessandro Cattelan wrote:



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
starting from today I'll have some more free time to dedicate to  
OOo

L10N so I'd like to start working on it. I'm wondering whether the
Italian team should use OmegaT or poEdit to translate the OLH and
possibly the GUI (using Pootle as a translation workflow manager).

Petr, Rafaella, can I go ahead and use OmegaT?


Ale,

I noticed that the TMX I created with translate-toolkit from the   
pseudo-translated .sdf are not useable because for some reason  
the  po2tmx script systematically removed one escape \  
character from  the original po file.




Hi Jean-Christophe

Please elaborate on the problem so that we can find out where the  
error
comes in and fix it if necessary. You can reply here, in private  
mail or

the translate-toolkit mailing list - as you prefer.

Friedel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] OmegaT or poEdit?

2007-06-18 Thread Jean-Christophe Helary



On 19 juin 07, at 08:26, Arthur Buijs wrote:


Hi,

Rafaella Braconi schreef:
For all the ones who are still looking for the answer to the  
question ... yes OmegaT  can be definitively be used to translate  
the sdf files converted into po files.
Really thank you to Alessandro, Jean-Christophe, Petr and all the  
ones who have worked to check into this and for sharing the  
information with the others.


+1

Sharing this information was very helpfull.
http://wiki.services.openoffice.org/wiki/ 
Nl.openoffice.org#Translation_for_OOo_2.3


Be aware that I was using Windows ;-)
Comments greatly appreciated.


I have just modified your text a little bit to clarify the project  
setting up and the glossary export from SunGloss.


JC

ps: there is a user list on Yahoo where the support is quite good,  
but I can create a list on SourceForge for people who prefer to stay  
on free open ground.




--
Regards/Groeten,
Arthur Buijs

Open software is a joy forever!
http://nl.openoffice.org
#nl.openoffice.org (irc.freenode.net)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] Contents of the OOo 2.3 .sdf, problems with TMX conversion

2007-06-17 Thread Jean-Christophe Helary

I realized a few days ago that the .sdf (at least for the fr project)  
for the coming 2.3 contains weird stuff without much of an  
explaination as to how to differenciate the different parts.


1) in some places the target part is made of what would be a fuzzy  
in PO, but without specific notification of the fuzzy character

2) in some places it seemingly contains exact matches
3) in some other places it contains the source English string

In the case where the fuzzy is present, the reference links are  
sometimes totally different. Which means that besides for the actual  
editing of the translation, it is also necessary to edit the links.


I wonder about the utility of such a mechanism especially since there  
is no way to differenciate between the 3 patterns in the .sdf itself.


It seems to me it would have been faster to _not_ insert fuzzies at  
all and to provide a complete TMX of the existing OOo contents instead.


Right now, if one wants to create a TMX out of the .sdf files (either  
with the Translate toolkit or with Hearstome translation suite, I  
suppose there are other ways though), it is impossible to have the  
source strings corresponding to the fuzzy target and thus the  
matching in a TMX suppotirting CAT tool will not be of much use.


Is there still a way to get SUN to provide the l10n teams with TMX of  
the existing contents, similarly to what we can get through the  
SunGloss system ?


(FYI, the NetBeans team is provided with TMX and that greatly  
enhances the localization process.)


Jean-Christophe Helary

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] TM, glossaries and OmegaT

2007-06-15 Thread Jean-Christophe Helary



On 16 juin 07, at 04:48, Alessandro Cattelan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
I ran a couple of test to see whether OmegaT could be used to  
translate
the OLH for OOo 2.3 and I forgot to post the results here as a  
follow-up

to this discussion (I've sent them to Sun and to the Italian community
only). I'm doing in it now just in case someone is interested in  
all this.


Apart from the test with OmegaT, I ran the same test with poEdit using
the same files and procedures.

At the end of this e-mail you'll find a report of what I've done, from
converting the original SDF file to PO, translating PO files with  
OmegaT
and poEdit and then back-converting the files to SDF. I'm not  
attaching
all the files and directories used since it would be too heavy -  
you can

download them from the following address:
http://tinyurl.com/2s4zwu

Ale.


Ale,

I started my own OmegaT process yesterday and I roughly documented  
it on the fr-l10n list.


Basically what I did was the following:

1) get the .sdf and convert to one big .pot to make sure the  
automatically inserted translations were not present.


oo2po -P --language=fr --nonrecursiveinput  
HC2_93824_89_2007-06-05_33.sdf HC.pot


2) convert the .sdf to .po, convert that to .tmx, clean the TMX to  
remove parts where the original msgid and msgstr were identical (not  
necessary)


po2tmx --language=fr HC2_93824_89_2007-06-05_33.po HC.tmx

3) export the EN-FR glossary from SunGloss and keep source term/ 
target term/target comment, all separated by tabs.


I loaded all this in a dedicated OmegaT project (.pot in / 
source/, .tmx in /tm/, glossary renamed with .utf8 in /glossary/)


And what I get is the familiar OmegaT session illustrated here:

http://www.eskimo.com/~helary/_files/session_omegat.png



For those unfamiliar with it, the top left part is the editor window  
(the bold green bkg part is the source segment, to be translated  
right below between segment  and end segment


The bottom left part is the translation memory matching window. Since  
I use the original .sdf contents I always have at least a 100% match  
that I either us as is or edit (after rewriting in the edit field  
with Ctrl+R) I can select other matches (Ctrl+nb) for rewrite (Ctrl 
+R) or insertion at point (Ctrl+I)


The right part is the glossary part. Items cannot be inserted  
automatically, they are only for reference.


There are menus that are not displayed on the screenshot, from where  
one can create the target files (at any time during the translation)  
to check them, it is possible to modify the project segmentation at  
any time (regex based) etc.


The only worry I have is that the target file will have problems with  
the back convertion to .sdf but your testing seems to prove that  
those can be relatively easily fixed...


JC


- - 
## Converting PO into SDF ##
- - 


[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls
backconversion  HC2_93824_89_2007-06-05_39.sdf  OLH-OmT-Project
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion/
HC2_93824_89_2007-06-05_39.sdf  po
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ cd backconversion/
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ ls
HC2_93824_89_2007-06-05_39.sdf  po
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ po2oo -t
HC2_93824_89_2007-06-05_39.sdf -l it po it_IT.sdf
/usr/lib/python2.5/site-packages/translate/storage/po.py:31:
DeprecationWarning: The sre module is deprecated, please import re.
  import sre
processing 35 files...
Error at
po/helpcontent2/source/text/shared/01.po:: 
0211.xhp#par_id2366100.help.text:

escapes in original ('\n', '\emph\Replace', 'with\/emph\') don't
match escapes in translation ('\emph\Replace', 'with\/emph\')
Error at
po/helpcontent2/source/text/shared/01.po:: 
0211.xhp#par_id9262672.help.text:
escapes in original ('\n', '\emph\Search', 'for\/emph\') don't  
match

escapes in translation ('\emph\Search', 'for\/emph\')
[###] 100%
[EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$
/opt/gsicheck_1.8.2/gsicheck it_IT.sdf
NO OUTPUT


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] TM, glossaries and OmegaT

2007-06-13 Thread Jean-Christophe Helary


Ale,

If non-pootle users still want to use TMs it is possible to use  
OmegaT too.



I should have thought about that a couple of weeks ago, before going
around looking for info on PO editors and all the rest. I think I've
missed the point in which OmegaT started supporting PO files... :o(


Sincerely, I really wondered why you werre not considering it when  
you started asking your questions about PO files here and there :)



We are now working on the OLH translation with poEdit and most of the
translators are complaining about the change: for a translator  
OmegaT is

just way better than any PO editor.


Especially if you work intensively with TMs.


I've tried doing what Jean-Cristophe suggested above and seen that it
could work. The only issue is that some TM matches will make no sense
because OmegaT take the tags into consideration while computing the
match percentage. For instance, for the following segment:


Ok, the problem is that PO files are not supposed to contain XML  
strings :)


Hence the suggestion that a little tweaking of the existing filters  
would provide better matches with the original .sdf files.


But I've worked with OOo sdf-po converted files in the past and had  
no problem getting over this issue.


I have not yet checked the current files' contents but if it is more  
about text than links then you'd rather use OmegaT with your TMX files.




Given all this, I would say that OmegaT could be the solution here. At
least for the Italian community which is used to this tool and
appreciate its features. I'm going to give it a try. One of the things
we'll have to pay attention to is whether the translated file are
correct and can be imported painlessly into Sun database as an .sdf
file. I'll send Sun a few translated files to test this and will  
report

back as soon as the check is done.


This is my worry also. So you should give it a try first with a short  
file.


One thing is not clear, though: why should I need to run msgcat?  
Can't I

just work with a bunch of separated po files and directories in a tree
structure (basically what I get when I run oo2po on the .sdf file)?


msgcat was suggested by the original PO file developer. See if that  
works without it.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] TM, glossaries and OmegaT

2007-06-12 Thread Jean-Christophe Helary



On 11 juin 07, at 22:07, Rafaella Braconi wrote:


Hi Alexandro,

for Russian Italian and Khmer we are referring to the Pootle server  
hosted on the Sun virtuallab. Please see details at:
http://wiki.services.openoffice.org/wiki/New_Translation_Process_% 
28Pootle_server%29


The 3 above languages are the only one which will be using Pootle  
to translate the 2.3 version since we are in the initial/pilot phase.
If everything goes well or at least we sort our the issues and we  
are able to fix them, all other languages and team that want to be  
added to this tool are more than welcome to join. That would be for  
the 2.4 release.


If non-pootle users still want to use TMs it is possible to use  
OmegaT too.


Sun can probably provide us with TMX files of previous translations  
in the relevant language pairs, the SunGloss contents can be exported  
as a glossary file and the source PO can be pretty much translated as  
is.


The correct procedure would be:
0) create a project in OmegaT
1) correctly format the PO file with msgcat
2) put that file in /source/
3) make sure that your TMX has srclang set to your source language  
the way it was defined in the project settings

4) put the TMX file in /tm/
5) make sure the exported SunGloss is a tab separated list in at most  
3 columns (1st col= source language, 2nd col=target lang, 3rd col=  
comments)

6) put the glossary in /glossary/
7) open the project and translate
8) when the translation is completed, msgcat the file in /target/ and  
deliver


For info, OmegaT is a GPLed Java Computer Aided Translation tool  
developed for _translators_. It is specifically _not_ for geeks.  
Which means that it is relatively straight forward to use.


I am pretty sure the filters can be tweaked to directly support  
the .sdf format but I leave that to others.


I know some already use it here. NetBeans localizers also use it  
intensively. And real translators too :)


Jean-Christophe Helary (OOo-fr)

http://sourceforge.net/projects/omegat/

CVS veersion:
cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/ 
omegat co -P omegat


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Pootle and terminology

2007-06-10 Thread Jean-Christophe Helary



On 11 juin 07, at 09:11, Alessandro Cattelan wrote:
I'd been told before that it should be quite easy to convert a txt  
into

PO but unfortunately I don't know how to do it.

Basically what I have is a long list of terms and expressions in two
tab-separated columns, one for the English version and one for the
Italian translation. Something like this:

fractionfrazione

I understand that a PO files with these entries would look something
like this:

msgid fraction
msgstr frazione

Is that correct?

I assume it would be quite easy to write a script for that, but I  
can't

do it.


Ale,

No need for a script.

Take the text editor you usually use and open your text file.
1) I assume that you understand regular expressions a little bit
2) and that the character between fraction and frazione in your  
text file is a tabulation


You'd have to search for:

^([^\t+])\t([^\t+])$

and to replace by

msgid \1\rmsgstr \2\r\r

The regexp may be slightly incorrect and will certainly depend on the  
text editor you use but give the above thing a try and fine tune  
until you get the proper results.


Cheers,
JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Po editor for OOo 2.3 translation

2007-05-31 Thread Jean-Christophe Helary



On 1 juin 07, at 03:14, Alessandro Cattelan wrote:


We've been asked to use the PO format because in a past project
converting to XLZ and back-converting to PO created quite a few
problems. I would be much happier using some other tool such as
Heartsome XLIFF Editor but I'd like to avoid producing a good quality
yet useless translation.


Ale,

Sophie (French lead) told me that indeed, the PO files provided for  
the most recent l10n job were not of the best quality. This time, the  
French l10n team will get .sdf files that Sophie will convert to .PO  
using the translate-toolkit tools which will, supposedly, provide  
translators with workable files.


If you manage to get the .sdf files I am sure there are ways to deal  
with them effortlessly with your editor of choice.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Po editor for OOo 2.3 translation

2007-05-30 Thread Jean-Christophe Helary



On 31 mai 07, at 03:39, Alessandro Cattelan wrote:


Hi,
I'll be working on the Italian translation for OOo 2.3. For the GUI
translation we'll be using Pootle whereas for the translation of the
online help we'll be using a PO editor.


Ale,

Is this the file you mentioned on l4t ? If yes, I was not aware that  
the .sdf files converter produced broken PO. Maybe reporting that as  
a bug would be better than trying to find a PO editor that works with  
broken files :)


And I forgot to mention that emacs has a PO mode, but I've not used  
it in a long time so I don't know if it's worth it.


Regarding using the TMX: convert it to PO with a few regex and use  
the gettext tools to incorporate it to your current PO file. That way  
you won't need a TMX fuzzy matcher. But I really think using PO  
dedicated tools for translation is a waste of resource. There are  
plenty of CAT tools that will leverage your TMX and parse your PO.  
But you need to get the PO fixed first, if possible.


JC


I don't have much experience with PO editors as I've only tried for a
short time software such as poEdit, Kbabel and Gtranslator.

I'd like to know if you have any suggestion as to which PO editor  
to choose.


It would be very important for me to be able to import or reuse a TMX
file I have. Is there any tool that would let me do that with a PO
editor?

Thanks.
--
Alessandro


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[l10n-dev] New version of TMX: release for public comments.

2007-04-19 Thread Jean-Christophe Helary


I just found:


TMX 2.0 released for public comment - March 28, 2007
TMX 2.0 has been released as a committee draft specification for  
public comment. Comments will be accepted through June 1, 2007 and  
should be sent to [EMAIL PROTECTED] The specification can be viewed  
online or downloaded as a zip archive. OSCAR is particularly  
interested in comments relating to implementation issues and  
especially welcomes feedback from tools developers and users of TMX.


on the Lisa TMX page:
http://www.lisa.org/standards/tmx/

Although the draft has been available for 3 weeks now, I don't  
remember seeing any announcement on any list of users of TMX.


The release is made for public comments but I also did not find  
anywhere the comments made in the last 3 weeks were made public.


Did I miss a link ?

Jean-Christophe Helary
OmegaT


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Which part of Help does what?

2007-01-21 Thread Jean-Christophe Helary



On 21 janv. 07, at 18:21, Pavel Janík wrote:

Is it really necessary to subscribe to and read so many mailing  
lists, in order to do a reasonable job as a translator?


You don't need to subscribe to the mailing list to be able to read  
it so your question doesn't make sense.


It is not her question that does not make sense, but the original  
answer: subscribe there to get the info to your one and only question.



I feel that you have never worked in so large project like OOo is.


I don't want to talk for Clytie because she is big enough to do that  
herself, but your feeling is wrong.


Or maybe you mean messy by large, in which case, you may be  
right. As I pointed out earlier today: plenty of redundant  
information all over the place, but to get _the_ tiny bit the one  
misses the suggestion is to subscribe to _yet_ another mailing list...


Jean-Christophe Helary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Ain's howto in the wiki

2007-01-20 Thread Jean-Christophe Helary



On 19 janv. 07, at 15:45, Jean-Christophe Helary wrote:


why is the translate toolkit necessary ?


Sorry to have brought the thread to a place far from where I expected  
it to go...


Anyway, I installed the translate toolkit and read the documentation  
and I eventually figured out that we need it basically for is:


1) convert the POT files to PO files
2) merge previous compendium to the current file

There is a whole paragraph about the specificity of the OOo  
localization file format all the rest of the document is about  
translation itself.


But according to Damien in the other day's IRC:

[21:58] 	damiend	We (Sun) extract POTs AND PO from the sources  
daily


So we don't really need to convert POT to PO anymore. Do we ?

So the purpose or the translation toolkit is basically only to merge  
the compendium files, Am I misunderstanding something ?


I understand that the current document is based on 2 years old data,  
so it may be that my questions are not relevant at all any more. But  
I'd like to contribute to this how-to, at least to separate it in two  
as I suggested the other day, so I welcome any comments.


Jean-Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: Ain's howto in the wiki

2007-01-20 Thread Jean-Christophe Helary


These pages have unique information, doubled information, uptodate
information, old information and irrelevant information ;)
Someone writing good English should merge this information and  
outline as

you described. In Wiki of course.


Ok :)

JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Ain's howto in the wiki

2007-01-19 Thread Jean-Christophe Helary


Ain,

Thank you for the precision. As I wrote to Clytie I did not mean to  
sound harsh.


As I am far from understanding the actual job of a team leader  
(Sophie is my leader) I'll wait a bit before making proposals here.


As I wrote in my first post, the French team received .xlz files  
created from the original .po. I did find that funny since OLT comes  
with a filter that is very simple to use and that allows translators  
to convert to .xlz if they need it. Besides, there was an encoding  
problem in our files so I had to de-convert the .xlz back to .po and  
to translate it in OmegaT etc.


From there I had a discussion with Sophie about the source file  
format issue and she advised that I joined the discussion here.


JC

On 19 janv. 07, at 16:46, Ain Vagula wrote:


On 1/19/07, Jean-Christophe Helary [EMAIL PROTECTED] wrote:

Clytie, Ain (?)

I have 2 questions: why is the translate toolkit necessary ? And why
aren't there any references to OLT ? The French community
receives .xlz files and would be unable to follow the howto since it
is exclusively based on po files (which is not a bad thing in the
absolute, but that is a different issue).

It seems to me the document addresses team leader's needs and not
really translators who will certainly not have to deal with most of
what is described on the document.


You are right, it is about starting new translation for perspective
team lead. It is written 2 years ago and is in state of unfinished
draft.
As it is in wiki, everyone can fix or complete this.
Setting inner rules for particular language is language teams own
business - formats, tools, way of communication, etc. Of course there
are 3-4 common directions that should be described, someone has to do
this but not me.

ain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Ain's howto in the wiki

2007-01-19 Thread Jean-Christophe Helary



On 19 janv. 07, at 17:25, Clytie Siddall wrote:

I was on the IRC last night (brandelune)


I wish there were tooltips or something so you could tell who  
people are.


Well, on mine (xchat/osx) you just click on the person and the  
registered name appears.


ps: OmegaT has a PO filter given to us last year by a Debian-fr  
activist... :)


was wondering about using it with those dratted Help files. Have  
you seen them? XML and lots of repetition.


I actually did the translation of a part of the recent package with  
OmegaT from the abck-converted po files. I set 2 rules to isolate the  
\\ or whatever was encumbering the segments and I was done.


I see that the original files here are sdf which looks like very easy  
to handle directly in OmegaT. The already translated parts could  
easily be converted to TMX and the not translated parts could be  
parsed either with a dedicated filter (any Java geek here ?) or with  
regex based segmenting rules.


I saw that most of the discussion yesterday revolved around what kind  
of CV system to use, I have never used Pootle so I guess I'll have to  
check how it works to be able to participate more here.


It is interesting to see how a lot of FOSS project have converging  
ideas. But it seems to me all this is still a little to geeky for the  
common translator to be able to join. Even OLT's learning curve is  
quite steep (plus .xlz is not a standard format...)


Anyway, I have to get back to my feed the kids deadlines ! I'll be  
back ! :)


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: Ain's howto in the wiki

2007-01-19 Thread Jean-Christophe Helary


Ain,

So the question you seem to have with xliff:


On 19 janv. 07, at 19:48, Ain Vagula wrote:

 By the way, I noticed that xliff files are stored in packed format,
 xlz? It
 means that when using xliff as only intermdiate version, without  
po-

 step,
 you'll not have any overview over changes in version control  
system?
 Its only a question, not statement - I dont know almost anything  
about

 deeper mechanism of version control.


Is basically: can we have the same level of vc information with xliff  
files than with po files ?


(besides for the fact that OLT's xlz is not representative of  
standard xliff files).


I don't know what is technically necessary in the current back-end,  
but basically po and xliff are formats with the exact same purposes.  
I'd say the only difference is historical: po came with GNU and  
gettext while xliff came from the localization industry and XML and  
is also much younger but integrates much better in xml workflows.


Sorry for misunderstanding what you meant by version control here. My  
idea is whatever localization format you use it will be easy to  
integrate it in the back-end, so we should not consider ourselves  
limited to only-po or only-whateverelse. But I fear that in the  
long run, sticking to po will not contribute to improve the whole  
system.


To me it would make much better sense to use a vendor-neutral  
version control system where the controlled output file is the  
closest possible to the original file (be it sfd or anything else the  
process uses internally) and provide a number of end-user filters  
to ease the life of would be translators: some would translate the  
raw files, some would convert that to po for use in their po tools,  
some would convert that to xliff etc...


So that we'd have a team leader who handles all the commits etc and  
packages the data for the team according to the procedure the team  
has chosen.


Whatever form this vc system takes it should be able to also output  
combined updated packages on which to build reference glossaries  
(CSV or TBX for ex) and translation memories (CSV or TMX for ex) by  
automatically aligning the committed files.


The few FOSS I participated to have all very good vc systems but very  
poor translation memory/glossary management, which means that the  
translator usually has to find reference the hard way. The computer  
aided translation tools that exist on the market today (FOSS or not)  
don't seem to be fully used in most FOSS projects which means that a  
lot of QA has to be done, and re-done and done yet again because  
translators can't fully leverage older translations.


Jean-Christophe


VC is fine for some processes but I think it is a little too much for
our purposes. As long as you have past documents stored as
translation memories (TMX) you don't need to have VC at all anymore
(at least if you mean VC the way I mean it). If your text has already
been translated it will be there in the TMX and either you have a
system to automatically update it or you do that manually.

There are a number of issues related to TMX: do we store everything
as sentences, or as paragraphs etc. And if we leave translators free
to use the process they want, how do we guaranty that they deliver a
TMX with the final document.

That is where XLIFF comes in: as long as the delivered document is
XLIFF it is trivial to extract a TMX from it for recycling in the
next translation. etc.

Sorry, I think as I type and maybe that was not the kind of answer
you were looking for.



I mean version control as CVS, SVN etc. We keep currently po-files in
CVS repository. When someone with write-access commits something to
CVS, system automatically sends a notification with full diff to
projects cvs mailing lists. (po-format is very easy readable) So is
easy for all members to have overview whats happening, also it is easy
to write comments or questions about commits, reply to comments and
make proofreading immediately after commit.
This is the way how all free software projects where I participate are
functioning.

ain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] running OLT

2007-01-18 Thread Jean-Christophe Helary


You should send that to OLT's list.

https://open-language-tools.dev.java.net/servlets/ 
ProjectMailingListList;jsessionid=D680E07AC70B0C4E86FCF49D0E2D96EB


JC Helary


On 19 janv. 07, at 14:35, Ain Vagula wrote:


openSUSE 10.2, trying to start OLT:
- with java 1.4.2:
[EMAIL PROTECTED]:~/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7 ./ 
translation.sh

Using java: /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java
/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java: error while loading
shared libraries:
/home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7/ 
spellchecker/lib/libgcc_s.so.1:

ELF file data encoding not little-endian
Installation direcotry:
/home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7
Classpath: TransEditor.jar:i18n:classes/dom4j-161.jar:classes/ 
fuzzytm.jar:classes/swing-layout-1.0.1.jar:classes/ 
xerces2.jar:classes/XliffBackConverter.jar:classes/xmlParserAPIs.jar

/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java: error while loading
shared libraries:
/home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7/ 
spellchecker/lib/libgcc_s.so.1:

ELF file data encoding not little-endian

-with java 1.5.0:
[EMAIL PROTECTED]:~/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7 ./ 
translation.sh

Using java: /usr/lib/jvm/java-1.5.0-sun-1.5.0_update8/jre/bin/java
java version 1.5.0_08
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03)
Java HotSpot(TM) Client VM (build 1.5.0_08-b03, mixed mode, sharing)
Installation direcotry:
/home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7
Classpath: TransEditor.jar:i18n:classes/dom4j-161.jar:classes/ 
fuzzytm.jar:classes/swing-layout-1.0.1.jar:classes/ 
xerces2.jar:classes/XliffBackConverter.jar:classes/xmlParserAPIs.jar

19.01.2007 7:36:35 org.jvnet.olt.editor.translation.TransEditor run
SEVERE: Exception:java.lang.Error: can't load
com.birosoft.liquid.LiquidLookAndFeel


ain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Ain's howto in the wiki

2007-01-18 Thread Jean-Christophe Helary


Clytie, Ain (?)

I have 2 questions: why is the translate toolkit necessary ? And why  
aren't there any references to OLT ? The French community  
receives .xlz files and would be unable to follow the howto since it  
is exclusively based on po files (which is not a bad thing in the  
absolute, but that is a different issue).


It seems to me the document addresses team leader's needs and not  
really translators who will certainly not have to deal with most of  
what is described on the document.


Also, I fear the described process is very likely to not attract  
people who are translators and who could contribute quality work  
because it is what they do for a living. It is way to geeky.


I understand the fact that a lot of people involved in FOSS l10n are  
familiar with most of the concepts and tools presented there. But now  
that OOo has grown well beyond geek zone, it seems to me most of the  
document's contents are not (or should not be) relevant to what a  
potential translation contributor should really be familiar with.


I suggest experienced team leaders edit the file to make a clear  
distinction between the leader's job and the translator's job.


Jean-Christophe

On 19 janv. 07, at 15:08, Clytie Siddall wrote:


Hi everyone :)

After last night's L10N IRC meeting, where we did mention the need  
for l10n howtos, especially for new translators, I have published  
Ain's howto in the wiki:


http://wiki.services.openoffice.org/wiki/ 
NLC:New_Translators_Start_here


I've added bits and pieces, but it's basically all Ain's work. :)

I've linked it from the main NLP page. We don't seem to have a main  
L10N page in the wiki.


I couldn't get footnotes working, so there are notes in  
parentheses. :(


I hope this doc. is useful. It embodies a whole list of things I  
had to find out by accident, or by persistent questioning, during  
this first release (2.1).


I couldn't find the link to Pavel's blog instructions on submitting  
files to him for build. Does someone have it, please?


Please feel free to amend this doc or add to it. :)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] followup to issue 73501

2007-01-17 Thread Jean-Christophe Helary



On 18 janv. 07, at 03:40, Marcin Miłkowski wrote:


Andras Timar napisał(a):


This is what happened with this OOo 2.2 update in case of some Sun
languages (e.g. i73150). Translators who use OLT should share their
experiences in this list. The main question is how good the OLT is at
ignoring tag changes. Does it offer good matches from the mini TM in
case of tag changes?


Yes, it does, but Transolution (a Python XLIFF translation memory  
tool) was even nicer as it allowed many ways to visualize tags on  
the screen (so that the view is not cluttered).


You could try MemoQ (freeware and Hungarian, made by engineers from  
Morfologic, which is a great recommendation to me), and Across  
(from Nero). They are closed source but still free to use (MemoQ is  
even Linux-compatible, I guess).


And there's OmegaT - tag handling is probably better now than before.


It depends on what you mean by before.

The main improvement is that OmegaT TMX files now respect tags and  
encapsulate them in the proper XML code. We have tested import of  
OmegaT's TMX into SDLX or Trados etc and the results were quite  
positive.


Besides for that OmegaT does not use penalties for different tags in  
match and source, so in a way it is nicer on the user. Plus it is  
slightly more intuitive and faster than OLT (that I use also sometimes).


But since OmegaT is not a XLIFF editor, it requires to work on the  
source file directly, and thus to have the source file format  
supported (PO is one of the supported formats).


I like OLT as I was able to do some translation jobs that would  
require the notorious TagEditor from Trados, yet it is very slowly  
developing as the main developer from Sun, Tim Foster, is not  
working on that anymore. I could try to implement new features  
but... It's not high on my to-do list.


Jean-Christophe Helary
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

1 2 >

1 - 100 of 101 matches

Mail list logo