Re: [l10n-dev] OpenCTI use?
Hello Reiko, Could you or anyone let me know how you want to use TBX format of glossary ? I believe OmegaT supports csv format in glossary feature. That is correct. Test versions of OmegaT support normal TSV files (.utf8 or .txt), CSV files (.csv) and TBX files (.tbx). Supported fields are source term/target term/comment but I am not sure how they are labelled in TBX. Could anyone let me know why TBX is desireble, rather than csv in your translation ? My guess is that TBX allows the glossary maintainer to have a finer control on the glossary contents by using all the allowed categories and labels, and to let the tools do their parsing according to their ability. Jean-Christophe Helary fun: http://mac4translators.blogspot.com work: http://www.doublet.jp (ja/en fr) tweets: http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] OpenCTI use?
On 19 juil. 10, at 06:23, Elsa Blume wrote: Hi Sophie, Getting back to you on this topic! And gathering info about the import/export option in TBX format to help you update the TD. Could you please tell me which are the tools/editors the Community wants to use to work with TBX format? OmegaT or Virtaal both support TBX. Jean-Christophe Helary fun: http://mac4translators.blogspot.com work: http://www.doublet.jp (ja/en fr) tweets: http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] OpenCTI use?
Rafaella, On 17 juin 10, at 21:49, Rafaella Braconi wrote: Hi Sophie, Is it planed to migrate OpenOffice.org glossaries from SunGloss to OpenCTI? no. There are no plans to migrate OOo glossaries from SunGloss to OpenCTI. On January 27, Reiko sent a mail in Japanese to a number of lists with the following title: Date: 27 janvier 2010 10:48:45 UTC+09:00 To: undisclosed-recipients: ; Subject: [ja-translate] SunGloss migration I translated that mail for the French lists. It was indicated in the mail that SunGloss was going to be read-only from January 31st and that the glossaries would be available from the Terminology tab in OpenCTI. What is the status of the migration Reiko mentioned ? Jean-Christophe Helary fun: http://mac4translators.blogspot.com work: http://www.doublet.jp (ja/en fr) tweets: http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
[l10n-dev] Re: [ja-translate] No coordinator for translation to Japanese and SUN or Oracle's position
Dear Maho, What is the current state of the team ? Can you give specifics ? I seem to remember that there were more than one translator. Jean-Christophe On 19 mai 10, at 16:34, Maho NAKATA wrote: Dear Saito Reiko-san and Rafaella, I would like to ask you about 3.3 translation to Japanese. Background. As I told, Kubota-san has been resigned and no dupty and successor. I don't know how we will translate 3.3 strings for Japanese. I - as the JA project lead - cannot accept, without leadership or without someone who takes the responsibility. Otherwise, our project will crash again. I'll look for the next coordinator, but I'm not sure we will find to 3.3 transation. Question. So here is the question. What SUN or Oracle will/can offer for 3.3 JA translation? Thanks -- Nakata Maho Jean-Christophe Helary fun: http://mac4translators.blogspot.com work: http://www.doublet.jp (ja/en fr) tweets: http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] Oracle Open Office supports 17 languages :)
On 21 avr. 10, at 16:38, Kazunari Hirano wrote: On this page, try to change Store Country from United States to other country. You can not find Japan! Unbelievable! And in the other countries, those who do _not_ have a shop but are on the list are: Austria Belgium Denmark Finland France Germany Netherlands Norway South Africa Spain Sweden Switzerland Plus if you go to the shop sites the site is not fully localized... Ivo san, can you urge Oracle to improve the Oracle Open Office site and help a Japanese who want to buy Japanese Oracle Open Office in yen :) please. And urge it to have shops in main European countries (like France and Germany) that are at the forefront of OOo adoption in the world... Jean-Christophe Helary fun: http://mac4translators.blogspot.com work: http://www.doublet.jp (ja/en fr) tweets: http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] Open Language Tools XLIFF Editor version 1.3.1 has been released
On 16 mars 10, at 03:05, André Schnabel wrote: For all who already tested the release candidate: you do not need to download the editor again. The release version is exactly the same as the RC. Hasn't the manual been slightly updated ? Jean-Christophe Helary fun: http://mac4translators.blogspot.com work: http://www.doublet.jp (ja/en fr) tweets: http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] Major L10n achievements
On 23 oct. 2009, at 01:41, André Schnabel wrote: Rafaella Braconi schrieb: If during the time November 2008 and October 2009, you, your team has reached any major milestone or anything that you feel it's important to mention, please send a short email indicating language or teams involved and a short description of what you have achieved. There have no big bang achievemnts for the Germanophone team within the last year - at least if you just look at the results. I'd like to mention a little big bang related to the German team. André's efforts on the Open Language Tool XLIFF editor now allows the translation community at large (beyond OpenOffice.org or any FLOSS localization team) to have a free software based XLIFF editor. Before that, translators had to work with closed source XLIFF editors or used relatively non trivial conversion paths (Okapi/Rainbow+OmegaT). Now, if their client wants them to work on XLIFF files, they can do so quite easily in one step: open it in OLT and work. That is a very good example of how FLOSS communities can provide professionals with amazing tools and I'd hope that will allow professional translators to participate more to FLOSS localization projects. I think that is an important challenge for FLOSS l10n communities. Thank you André for your recent work on OLT, and thank you to Tim Foster and all the others at Sun who first brought us this tool originally (STE, for the old timers here). Jean-Christophe Helary (JA/EN FR) http://mac4translators.blogspot.com/ http://twitter.com/brandelune http://www.doublet.jp - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] No source to translate ?
On Aug 24, 2009, at 4:52 AM, Sophie wrote: And yes, I will also have to make my terminology project appear in Pootle. Been pushing that forward for some time now I don't know what is your operating system, but there is several tools that allow you to translate like OmegaT, OTE, and I use PoEdit and its TM. I/we find it more convenient than Pootle, even if there is still some ehnancement to bring to these tools. OmegaT, OTE and PoEdit are available on OSX/Windows/Linux (I've suggested a few lines to Andre for the OTE manual so that OSX users easily figure out how to make it start). Jean-Christophe Helary - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] How's Developer's Guide Pootle going?
On Aug 12, 2009, at 10:51 AM, Aijin Kim wrote: Our members are interested in the format of the files for shortcut. Would be they PO ?, XLIFF or others ? The format will be XLIFF. I believe you'll be familiar with the format soon. :) Is the XLIFF based on the original XML contents or is it based on SDF ? If it were based on the original XML, the tags would be very easy to work with in XLIFF supporting CAT tools. The SDF data converts all to text strings and breaks all the tag support found in modern applications. André's latest work on OLT is making XLIFF editing very easy even though the software does not seem to be super stable (I had 2 freezes last night working with a real world XLIFF file). It is also possible to work with OmegaT and Rainbow (both GPL/Java), the process is much more robust. Jean-Christophe Helary http://mac4translators.blogspot.com/ http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] Strange tags in .po file
On Aug 10, 2009, at 8:26 PM, Eike Rathke wrote: I somewhat doubt that CRs would be used in dialogs. Usually line feed characters (0x10, #16;) are used instead. I'm not familiar with that extension though, should be clarified with the code owner 'mav'. But the fact is that #13; is a CR. It may originate from a string pasted from a Mac file into the code, or from something totally different... Jean-Christophe Helary http://mac4translators.blogspot.com/ http://twitter.com/brandelune - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] Strange tags in .po file
On Aug 8, 2009, at 11:37 PM, Sophie wrote: Hi all, In swext/mediawiki/src/registry/data/org/openoffice/Office/ Custom.po, there is strange tags like: #13;#13 A wiki article with the title '$ARG1' already exists.#13;#13;Do you want to replace the current article with your article?#13;#13; Is it normal tags? They seem to be carriage returns. CR. The line ending character for Mac files. Jean-Christophe Helary - To unsubscribe, e-mail: dev-unsubscr...@l10n.openoffice.org For additional commands, e-mail: dev-h...@l10n.openoffice.org
Re: [l10n-dev] Proposal: create pootle-translation-method mailing list
On 13 mars 08, at 15:15, Pavel Janík wrote: Repeat after me: [...] I am not sure this is the proper way to address fellow list members. Why can't you accept that your proposal was not worded well enough to gather enough support ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] POOTLE: Content update on March 13th
On 12 mars 08, at 21:26, André Schnabel wrote: The workflow is quite the same as we used to have with sdf-files and OTE. So we have no real benefits, changed the tools (what means we need to learn new tools) and lost a lot of time experimenting. Same here. I can see that some teams feel their workflow is improved with pootle/PO files and translate-toolkit magik but it is not the case for a number of others. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Proposal: create pootle-translation-method mailing list
On 9 mars 08, at 16:15, Rail Aliev wrote: Thus I'd like to propose the setup of special [EMAIL PROTECTED] mailing list. The purpose of which would be ? dev@l10n.openoffice.org [EMAIL PROTECTED] [EMAIL PROTECTED] What would be the distinctive use of each one of those lists ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Proposal: create pootle-translation-method mailing list
dev@l10n.openoffice.org General purpose L10N list. Here we discuss everything about L10N. [EMAIL PROTECTED] [EMAIL PROTECTED] These list can be merged as one list where we can discuss translation specific things. Could you make a list of the recent threads and classify them in either category so that the purpose of each list is clearer ? Because to me, everything about localization includes translation. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Proposal: create pootle-translation-method mailing list
On 7 mars 08, at 17:42, Pavel Janík wrote: I think that would only create confusion since some issues are inter-related. Do you have an example of such issue? Discussing Pootle in conjunction with OmegaT. If you mean a _technical_ list where pootle-dev questions exclusively are discussed then why not, but that is not clear from your proposal. Plus I don't think that is OT on this _DEV_ list. Personally, I think what we rather need is a list for translators/l10n managers where they can discuss practical issues and a dev-only list where the technical issues (eventually pootle related) are discussed. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Proposal: create pootle-translation-method mailing list
On 7 mars 08, at 18:28, Pavel Janík wrote: Or you mean how to use OmegaT in connection with Sun's Pootle instance to translate OOo? Of course that is what I mean. We discuss OOo localization here don't we ? But yes, we should at least start to think about splitting general l10n and translation related stuff... But the thing is that we don't have any translation discussion here. We have discussions about processes. Either processes on SUN side or processes on team sides. I don't think there is a clear cut between both. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] For teams that use OmegaT
The Italian team l10n leader asked me a number of questions offlist regarding the workflow, I replied with the French and Japanese groups in Cc since those are the groups I participate to. The mail is here: http://ja.openoffice.org/servlets/ReadMsg?list=translatemsgNo=3419 Also, the OmegaT Project has just released a test version of OmegaT 1.8 that comes with spellchecking and a number of other important new features. It is called test because the manual is not up to date and because there are a few areas that need some ironing out but I've been using it since its first branching in CVS and I've had no data loss problem at all. I wrote something about the whole thing here: http://mac4translators.blogspot.com/2008/03/omegat-173-18-19.html 1.8 is a major improvement because it at last comes with spellchecking (hunspell, with the dictionaries that OOo uses). I encourage all the teams that use OmegaT to work with the test version. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: problem strings in OmegaT
Aijin, Thanks for your comments. Yes, I agree that it'd be the best way to switch the style from msgid_comment to msgctxt. I also confirmed that msgctxt works ok in OmegaT. I did not have the time to check. What do you mean by msgctxt works ok in OmegaT ? Is it displayed separately as a comment or is it simply ignored ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: problem strings in OmegaT
Aijin, Ok, very good. It would have been a very good surprise to see msgctxt appear somewhere though ;) Also, OmegaT 1.8 with a spellchecker (hunspell, works with OOo dictionaries) has been released in test version yesterday: http://mac4translators.blogspot.com/2008/03/omegat-173-18-19.html JC On 3 mars 08, at 17:56, Aijin Kim wrote: Hi JC, In OmegeT 1.7.3, msgctxt seems to be simply ignored. There is no display for msgctxt. What I meant was that OmegaT works ok with 'msgid' and 'msgstr' fields regardless of msgctxt field. Regards, Aijin Jean-Christophe Helary 쓴 글: Aijin, Thanks for your comments. Yes, I agree that it'd be the best way to switch the style from msgid_comment to msgctxt. I also confirmed that msgctxt works ok in OmegaT. I did not have the time to check. What do you mean by msgctxt works ok in OmegaT ? Is it displayed separately as a comment or is it simply ignored ? Jean-Christophe Helary K.K. DOUBLET - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: problem strings in OmegaT
On 29 févr. 08, at 17:02, Aijin Kim wrote: Hi JC, Thanks a lot for your kind explanation. So you mean that you manually delete the msgid_comment part from each target string? If so, it should be better that source string doesn't include the msgid_comment line in source string to avoid additional work, right? That is correct. Now, I'm thinking if we need to use msgctxt style. Ain has confirmed that poedit supports it. I'm not sure about OmegaT. If OmegaT also supports msgctxt, it'd be good to change the format of po files from next update. OmegaT will ignore its contents. It only sees msgid. JC Jean-Christophe Helary 쓴 글: On 29 févr. 08, at 15:20, Aijin Kim wrote: Hi JC, I guess what Ain mentioned was that 'msgctxt' option during oo2po saves the comment line in another field rather that adding to msgid fileld. Then there won't be no change with msgid string. So for current po files, do you simply ignore the comment line in your translation? As far as OmegaT is concerned, yes. But OmegaT is even weirder than that :) Basically, OmegaT has been conceived for translating documents, monolingual documents. Not for working with intermediate localization formats. Basically it works that way: • It first parses the file, keeps the structure (skeleton) part in memory and puts all the translatable strings to the display. • The translator goes through segments one by one and types the translation by also referring to the available translation memories and glossaries. • When the translator wants to see the result, the translated files are build by using the skeleton in memory and by filling in with the translated strings. Anything that has not been translated is left with the source values. The problem with PO or XLIFF etc, it is that the skeleton of the file has placeholders already for source and target. Which means that OmegaT should read what it sees in source, consider what is already in target and put the translation in target if necessary. PO includes in itself sort of a TM function by adding fuzzy strings and by keeping the whole legacy translation in itself. In OmegaT this TM part is handled totally separately because monolingual documents are not supposed to come with such embedded data, at least not in the current CAT world. In the case of PO files, it needs to have empty msgstr so that it can pretend to work as for a normal monolingual document by considering exclusively the contents of msgid, and even if the msgstr is not empty it just ignores its contents (future developments are aiming at putting that contents automatically in TM): The process is then: parse what is in msgid, display for translation, and _rewrite_ the whole file with msgid=msgstr for places that have not been translated yet... Which is the reason why OmegaT is perfect for HTML, ODF and whatever is monolingual and works on a _document_ basis (cf the NetBeans l10n process), but not so good for intermediate or pre- processed formats (like the OOo and other PO based l10n processes). Eventually, the dev team will work on the issue of intermediate formats. But right now OmegaT will work best with proper msgid and empty msgstr, with all the legacy contents put in TMX or glossaries. That is what OmegaT is good at handling :) JC Thanks, Aijin Jean-Christophe Helary wrote: Hi all, Some strings of po files have a line which was added to make each msgid string be unique using --duplicates=msgid_comment option during executing oo2po. http://translate.sourceforge.net/wiki/toolkit/ duplicates_duplicatestyle How can OmegaT or poedit handle the added line? Since the string is a msgid it is handled as a source string and is displayed as translatable. ... thats one reason we switched to msgctxt comment style, where identifier is stored in separate field. See also http://vagula.blogspot.com/2008/02/attention-to-community-translators.html Ain, Sorry I don't understand your comment. What Aijin asked is how does OmegaT (and POedit, which I don't use) handle tweaked msgid. My reply was that it handles them as normal msgid. I did not see a reference to msgctxt. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Jean-Christophe Helary K.K. DOUBLET - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands
Re: [l10n-dev] Re: problem strings in OmegaT
On 29 févr. 08, at 12:50, Aijin Kim wrote: Hi Dick, I don't have much experience with OmegaT. I think it's the best way to ask community's help on this. :) Hi all, Some strings of po files have a line which was added to make each msgid string be unique using --duplicates=msgid_comment option during executing oo2po. http://translate.sourceforge.net/wiki/toolkit/ duplicates_duplicatestyle How can OmegaT or poedit handle the added line? Is there any way to hide the line from source string or can we simply ignore the line when translating? I can see Pootle handles the line and hide it from source string for online translation. But not sure in terms of offline editors. Since the string is a msgid it is handled as a source string and is displayed as translatable. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: problem strings in OmegaT
Hi all, Some strings of po files have a line which was added to make each msgid string be unique using --duplicates=msgid_comment option during executing oo2po. http://translate.sourceforge.net/wiki/toolkit/ duplicates_duplicatestyle How can OmegaT or poedit handle the added line? Since the string is a msgid it is handled as a source string and is displayed as translatable. ... thats one reason we switched to msgctxt comment style, where identifier is stored in separate field. See also http://vagula.blogspot.com/2008/02/attention-to-community-translators.html Ain, Sorry I don't understand your comment. What Aijin asked is how does OmegaT (and POedit, which I don't use) handle tweaked msgid. My reply was that it handles them as normal msgid. I did not see a reference to msgctxt. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: problem strings in OmegaT
On 29 févr. 08, at 15:20, Aijin Kim wrote: Hi JC, I guess what Ain mentioned was that 'msgctxt' option during oo2po saves the comment line in another field rather that adding to msgid fileld. Then there won't be no change with msgid string. So for current po files, do you simply ignore the comment line in your translation? As far as OmegaT is concerned, yes. But OmegaT is even weirder than that :) Basically, OmegaT has been conceived for translating documents, monolingual documents. Not for working with intermediate localization formats. Basically it works that way: • It first parses the file, keeps the structure (skeleton) part in memory and puts all the translatable strings to the display. • The translator goes through segments one by one and types the translation by also referring to the available translation memories and glossaries. • When the translator wants to see the result, the translated files are build by using the skeleton in memory and by filling in with the translated strings. Anything that has not been translated is left with the source values. The problem with PO or XLIFF etc, it is that the skeleton of the file has placeholders already for source and target. Which means that OmegaT should read what it sees in source, consider what is already in target and put the translation in target if necessary. PO includes in itself sort of a TM function by adding fuzzy strings and by keeping the whole legacy translation in itself. In OmegaT this TM part is handled totally separately because monolingual documents are not supposed to come with such embedded data, at least not in the current CAT world. In the case of PO files, it needs to have empty msgstr so that it can pretend to work as for a normal monolingual document by considering exclusively the contents of msgid, and even if the msgstr is not empty it just ignores its contents (future developments are aiming at putting that contents automatically in TM): The process is then: parse what is in msgid, display for translation, and _rewrite_ the whole file with msgid=msgstr for places that have not been translated yet... Which is the reason why OmegaT is perfect for HTML, ODF and whatever is monolingual and works on a _document_ basis (cf the NetBeans l10n process), but not so good for intermediate or pre-processed formats (like the OOo and other PO based l10n processes). Eventually, the dev team will work on the issue of intermediate formats. But right now OmegaT will work best with proper msgid and empty msgstr, with all the legacy contents put in TMX or glossaries. That is what OmegaT is good at handling :) JC Thanks, Aijin Jean-Christophe Helary wrote: Hi all, Some strings of po files have a line which was added to make each msgid string be unique using --duplicates=msgid_comment option during executing oo2po. http://translate.sourceforge.net/wiki/toolkit/ duplicates_duplicatestyle How can OmegaT or poedit handle the added line? Since the string is a msgid it is handled as a source string and is displayed as translatable. ... thats one reason we switched to msgctxt comment style, where identifier is stored in separate field. See also http://vagula.blogspot.com/2008/02/attention-to-community-translators.html Ain, Sorry I don't understand your comment. What Aijin asked is how does OmegaT (and POedit, which I don't use) handle tweaked msgid. My reply was that it handles them as normal msgid. I did not see a reference to msgctxt. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Jean-Christophe Helary K.K. DOUBLET - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Translatable contents extraction ?
Thank you very much Friedel. Is there a simple tool that can extract the translation data and later merge the translated data ? pofilter and pomerge will help you do this. In fact, if you send your translations right back to Pootle, you can just upload the translated subsets when you upload (as long as you don't choose overwrite when you download. The default behaviour should be merge, which is what you want. Reiko, I think we have a solution now :) We can do an extraction on the PO files, translate the text in OmegaT with the TMX Rafaela provided us with and check that within OmegaT, then merge the translated files to the original package :) JC On 25 févr. 08, at 16:41, F Wolff wrote: Op Maandag 2008-02-25 skryf Jean-Christophe Helary: Is it possible to only have the PO parts that need translation/ updating and not the whole set ? All the already translated parts are irrelevant to the translation itself (except when used as translation memories). Is there a simple tool that can extract the translation data and later merge the translated data ? pofilter and pomerge will help you do this. In fact, if you send your translations right back to Pootle, you can just upload the translated subsets when you upload (as long as you don't choose overwrite when you download. The default behaviour should be merge, which is what you want. http://translate.sourceforge.net/wiki/toolkit/pofilter http://translate.sourceforge.net/wiki/toolkit/pomerge You can download a ZIP file of all the PO files in the project/ directory where you want to do this. You are interested in pofilter --test=untranslated, but the page above will give more information on the command line use. Friedel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Re: [ja-translate] Re: [l10n-dev] How can we review with Pootle ?
Reiko, If the number of fuzzy is small, we can work on Pootle directly, can't we ? That is what I would suggest. If the update volume is high, such as in HC, we can use [TM] mark to be inserted to the leverage from TMX, right ? Is there any way to do the opposite, taht is, mark new translation ? It is possible to insert it manually. But what I propose is an automatic insertion when OmegaT recognizes a 100% match. If there's any mark put on the new translation, we can search that segment with that mark. I understand. It would be indeed very convenient :) Especially since searches in OmegaT cover both source and target without distinction ... The ideal is that we get only the segments to translate or update, not the whole package. That is a waste of resources and requires useless roundtrip manipulations... I suggest we extract all the non translated segments before starting the translations. That would make all the manipulations above irrelevant. JC Even if there's no such a way, your workaround will be a big help. Thank you again for your help! Regards, -Reiko Jean-Christophe Helary wrote: On 18 févr. 08, at 18:48, Jean-Christophe Helary wrote: Let me confirm. I understand the new/fuzzy is identified on OmegaT, but once the translator did the translation and put the translated string to the untranslated segment, how the reviewer can recognize which one is the strings to review ? Reiko, PO is not exactly the strong point of OmegaT :) I'll check tonight with a PO from Pootle and will get back to you later. Maybe on the ja list ? Reiko, I have just tried OmegaT with Localization.po from javainstaller2. The file is translated at 97% and contains only 2 fuzzies to check. The conclusion is that OmegaT is useless for files that mostly contain translated and fuzzy strings. Ideally, a source file should not contain such strings and all the reference should be stored in a TMX. The fuzzies should be left empty for normal translation. If you work with a file that is mostly untranslated and where the reference parts are clearly separated from the source, it is trivial to set OmegaT to insert the TM reference with a prefix to distinguish it from the Translator's input. Just set OmegaT to automatically insert 100% matches with a [TM] prefix, or anything you want. The translator will still be in control of the process and will be able to do modifications to the input if necessary. When the reviewer checks the file, only the parts that are not marked with [TM] will have to be checked. I understand that this is not an ideal workflow though... Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Reiko Saito Japanese Language Lead Translation Language and Information Services (TLIS) Globalization Services Sun Microsystems, Inc. Email: [EMAIL PROTECTED] Phone: +81 3 5962 4912 Blog: http://blogs.sun.com/reiko - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Translatable contents extraction ?
Is it possible to only have the PO parts that need translation/ updating and not the whole set ? All the already translated parts are irrelevant to the translation itself (except when used as translation memories). Is there a simple tool that can extract the translation data and later merge the translated data ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] How can we review with Pootle ?
On 18 févr. 08, at 18:48, Jean-Christophe Helary wrote: Let me confirm. I understand the new/fuzzy is identified on OmegaT, but once the translator did the translation and put the translated string to the untranslated segment, how the reviewer can recognize which one is the strings to review ? Reiko, PO is not exactly the strong point of OmegaT :) I'll check tonight with a PO from Pootle and will get back to you later. Maybe on the ja list ? Reiko, I have just tried OmegaT with Localization.po from javainstaller2. The file is translated at 97% and contains only 2 fuzzies to check. The conclusion is that OmegaT is useless for files that mostly contain translated and fuzzy strings. Ideally, a source file should not contain such strings and all the reference should be stored in a TMX. The fuzzies should be left empty for normal translation. If you work with a file that is mostly untranslated and where the reference parts are clearly separated from the source, it is trivial to set OmegaT to insert the TM reference with a prefix to distinguish it from the Translator's input. Just set OmegaT to automatically insert 100% matches with a [TM] prefix, or anything you want. The translator will still be in control of the process and will be able to do modifications to the input if necessary. When the reviewer checks the file, only the parts that are not marked with [TM] will have to be checked. I understand that this is not an ideal workflow though... Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] How can we review with Pootle ?
Suppose the translator downloads the file, translate, and upload them as suggested to Pootle, 1. Is there any way to accept all suggested sgements in a single step on Pootle? Reiko, 2. If we review the files off-line, how can we identify the new translation on OmegaT ? It is not trivial. The best way to work with OmegaT is to have untranslated files (without fuzzies, those are handled separately by the TM matching process), to translate them and to review them within OmegaT, or in a plain text|PO editor. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] How can we review with Pootle ?
Let me confirm. I understand the new/fuzzy is identified on OmegaT, but once the translator did the translation and put the translated string to the untranslated segment, how the reviewer can recognize which one is the strings to review ? Reiko, PO is not exactly the strong point of OmegaT :) I'll check tonight with a PO from Pootle and will get back to you later. Maybe on the ja list ? JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] How can we review with Pootle ?
On 19 févr. 08, at 11:38, Reiko Saito wrote: Hi JC, I'll check tonight with a PO from Pootle and will get back to you later. Maybe on the ja list ? Thanks! That will be great. Sorry, I was busy last night... I'll do that later today. I am curious how French community is reviewing the translation ? Sophie and Elsa will be able to reply, I am only a translator :) How are you identifying the segments for review ? I don't think you are reading all of the segments, but focus on the newly tranlsated ones, right ? I understand Pootle shows Suggested translation, but if there are many segments, you are working off-line and upload them to Pootle, I assume. What we have done so far is that I translated the UI in Pootle because there were few segments and since I had forgotten (I am not used yet to the tool) that one could set the suggested flag instead of committing the translation Sophie had indeed to check all the UI strings... We agreed to work with Suggested in a next batch. If you are reviewing them only on Pootle, are you accepting the new translation one by one ? It seems to take time. For the Help files, I think Sophie translated everything offline and I don't know how she managed the review (yet). We still are in the process of adapting ourselves to the new tool. As far as I see it, if one is used to OmegaT, shifting to Pootle is not that convenient. File assignment management can be made otherwise etc. The next big batch of untranslated files will be, I guess, fully handled offline, review included. But we still have to discuss that. Sophie, Elsa, any comment ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TMs for 3.0
On 13 févr. 08, at 14:16, Clytie Siddall wrote: In response to our requests, the latest version of the Translate Toolkit actually has _less_ escaping than the SDF file. It replaces the extra escaping when you convert to SDF. Sure, but the TMX that Rafaella delivered have _no_ escaping problems whatsoever. It is the SDF-PO-TMX conversion that causes the incompatibilities that have been previously mentioned. The fact that the current SDF-PO- TMX is less bad at delivering properly escaped (or not) files is not really relevant. Also, the fact that TMX sets have different escape rules depending on the converter version defeats the purpose of TMXs... Do we need to reconvert megabytes of file sets every time there is a new version of TT ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TMs for 3.0
On 8 févr. 08, at 00:41, F Wolff wrote: Furthermore, I think it is important to note that these TMX files does not follow the same unescaping rules of the new conversions done by Translate Toolkit 1.1. Of course, TMX files corresponding to the new unescaping rules can be generated from the set of PO files with po2tmx from the Translate Toolkit: http://translate.sourceforge.net/wiki/toolkit/po2tmx Because they have been created directly from the SDF files and not from the PO files. The translate toolkit is vastly over escaping strings. If you need a reference to see how proper escaping should be done, take a look at the PO files from the Debian distribution. It would be nice if TT could comply with already established rules of the industry and not re-invent the wheel at each new release. ps: do you also meant that TMX created with TT 1.1 are not compatible with TMX created with previous TT versions ? Why do you think you have the right to break our work because you suddenly decided to change your specs ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OpenOffice.org 3.0 - Translation Schedule
Aijin, Rafaella, On 5 févr. 08, at 12:58, Aijin Kim wrote: Sorry, you even don't have to merge the po files. I.e. the step to create tmx are: 1. download po files from Pootle 2. run po2tmx Isn't it possible to have the TMX available directly from SUN as they were for 2.4 ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Who is in charge of Pootle's FR localization ?
I found a number of problems in the FR strings of Pootle. How is it possible to correct them ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Who is in charge of Pootle's FR localization ?
Aijin, Eventually I modified the original Pootle and was told that the modifications would be published in the next release. JC On 4 févr. 08, at 11:18, Aijin Kim wrote: Pootle project is default for localized UI of Pootle. The modifications in Sunvirtuallab Pootle will not be applied to officially released Pootle. Aijin Jean-Christophe Helary 쓴 글: On 3 févr. 08, at 22:17, Aijin Kim wrote: Hi Jean-Christophe, Sunvirtuallab Pootle only hosts OpenOffice.org and OpenSolaris.org. Aijin, On my project list I have: Projects OpenOffice.org HC2, OpenOffice.org UI, OpenSolaris.org, Pootle, Terminology Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Pootle issues
On 3 févr. 08, at 12:35, Aijin Kim wrote: Hi Pootle user, When you have a trouble during online translation, please let me know your status so that I can check out what's the problem. Aijin, When I have a number of untranslated segments in the PO file, it is possible to set Pootle to only display those ? I could not find how to mark them in a way that would instandly catch the eye. It is the same for approximations, although they seem to have a grey vertical bar in fron of them. Also, I found that the numbers given from the top page (nb of untranslated/approximates) did not match the real numbers I found when actually opening the files. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Pootle issues
On 3 févr. 08, at 14:23, Aijin Kim wrote: If you select 'Show Editing Function' in a project page, e.g. http://sunvirtuallab.com:32300/fr/helpcontent2/ , you can see 'Quick Translation' link. It only shows untranslated and fuzzy strings one by one. You can click 'skip' button to go to next string. However, there is no way to list them in one page. Thank you Aijin for this hint (btw, the French translation of the Pootle interface is not of very good quality. Where can modifications be proposed ?) I think the Show editing function item should be present on all pages so that the user does not have to go back and forth between the current PO file page and the project top page. I am trying that now but it seems the whole project is displayed then, not a PO file per PO file filtering. Is that what you meant by there is no way to list them in one page ? First, please make sure that the numbers on the graph of a project are number of words, not strings. Ok, now that I think of it that must be the answer. Now there is a UI issue then. Whan I look at: http://www.sunvirtuallab.com:32300/fr/openoffice_org/ I see Non-traduit (non translated) = 84, but it is not immediately clear that 84 is a string number. By looking at that page, I see the first row = translated=1 so I suppose those are string numbers. But, when I go to svtools/source/ I see that non translated is at 25 for misc.po but when I check misc.po the file header block says 9 strings, so the assumption that the listed values were strings and not words was wrong from the beginning. It seems to me both the word/string could should always be present anywhere there is a count displayed to ensure that there is no confusion possible. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] search function in Pootle
On 30 janv. 08, at 22:54, Olivier Hallot wrote: Hi I want to locate a specific string (say Gallery) in the po files tree, but I only get the first occurence and it seems that there is no other way to find the others. Any hint? Download the file and do the search offline. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Formats, tools, and workflow
Friedel, Thank you for the very comprehensive reply. tags are escaped, and yes, if somebody does the work, going directly from the XML help files to translation formats, could provide some benefits. Could you point where the files are ? meta-information by means of the x-comment information. oo2po from the translate toolkit will add those notes to the PO file (and oo2xliff will add it to note tags of XLIFF files). Have you considered the context XLIFF tag ? We also have a converter that goes directly from SDF to XLIFF. It shipped with the current version of the toolkit, although a packaging bug might hide it for some users. The packaging bug will be fixed in the next versions of the toolkit. When do you plan to release it ? can therefore be seen as being similar to compiling to binary format. We store our localisation formats in a version control system, and that is considered to be the stored translations. This way we also don't need to retranslate with a TM at the start of version update such as the method is with OmegaT (according to my understanding). Well, another way to look at OmegaT is to consider it as a CVS specialised in translated strings. There is no need to retranlate with a TM in the case of OOo since we only get the non translated strings as source. NetBeans does that differently. They release _all_ the strings so it looks more like what you describe. OLT with XLIFF files: About OLT not being able to open our XLIFF files: our XLIFF files are well formed as far as we know - please report any bugs to our mailing list or bugzilla. We have validated some of our XLIFF files according to the XLIFF DTD, so I would be surprised if they are truly malformed. OLT supports only XLIFF 1.0. From what I heard, OLT does not use an XML parsing library to do that but has it all hardcoded. Which means that support for more recent versions of XLIFF requires a lot of work. A way to work around that would be to provide SDF filtering for OLT directly. Claims of mismatches between PO and TMX files: My understanding is that this error is reported by users of OmegaT. Not exclusively. People who manually edit the files have to add the escapes missing in the TMX as provided by SUN. Besides, it is not a claim, it is a fact that the data contents of the PO provided by coordinators who use oo2po and of the TMX provided by SUN are not equivalent. The claim is yours when you write below that such mismatching should be properly interpreted. It is also my understanding that OmegaT doesn't actually interpret PO files, but only contains functionality to identify / highlight the different parts of the PO file for translation. I salute the great work of the OmegaT community, but if the tool doesn't understand the format, the PO/TMX tools can't take the blame for it. To see the PO and TMX files working well, I suggest people try using a TMX file with pot2po (either the TMX file provided by Sun, or one created with po2tmx from the translate toolkit). Is there an official documentation regarding the PO format ? The GNU pages do not refer to anything to interpret as far as escape sequences are concerned. The only formal reference there is to find is to C syntax for character strings, which means the escapes. But I could not find any part of the GNU gettext manual that says how PO parsing tools should interpret the format, except as textual data. Can you give me links to recommanded implementation practices for PO tools ? I guess the point of Translation Memory _eXchange_ is the point that it should be exchangeable regardless of what it will be used for. If tools interpret the escaping differently, that does pose a problem and that will need to be addressed. TMX is XML and is parsed with XML parsers. However, if the issue is that OmegaT is translating PO files as text files without regard for the PO file format, I don't think we can lay the blame on the TMX specification or something else. Are there PO parsers provided by GNU or GPLed that OmegaT could use to improve its parsing of PO files ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] gsicheck for Mac Intel ready for download
On 29 déc. 07, at 14:58, Clytie Siddall wrote: Sorry, I seem to be messing this up somehow. I unpacked the directory, but I can't call the executable, whether I put the directory in Applications or /usr/bin . I just get -bash: gsicheck: command not found. :( My PATH is pretty comprehensive. Clytie, I just tried it on my Mac and did not seem to have the problem you describe. I use a /bin/ directory in my home folder, the directory is in my PATH and gsicheck was called correctly by bash. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] PO and TMX contents do not match, help !
Javier, I am glad we at last managed to agree on the most important: if I generate TMX form PO, y should use it with POs, and if I generate from XLIFF, I should use it for XLIFF... Yes, and if we generate TMX from SDF (like SUN's TMX) then it is supposed to work with SDF, which is the reason why I proposed a way to work with SDF directly. and then it works. It does indeed. If communities want to work with the TMX that SUN provides then they can use the workflow I proposed and they'll see wonders. I am afraid that at this point we do not have such a thing as correct/universal TMX files. Agreed, TMX depend on the original contents. And so it should be match with the format in which the original contents is expressed. ... and that there is no truth on this, just opinions and systems that work. 100% with you. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Open Language Tool
Mechtilde, There are explanations for OLT localizers on OLT's page: https://open-language-tools.dev.java.net/editor//xliff-editor-l10n.html You can also join their developer's list to ask technical questions related to the localization. https://open-language-tools.dev.java.net/servlets/ProjectMailingListList On 27 déc. 07, at 23:49, Mechtilde wrote: Hello, after the second translation round I have had the idea to translate also the translation tool. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] PO and TMX contents do not match, help !
Thank you for the reality check Alessandro. Any other community willing to share experiences ? I would really like to know what are the commonly accepted best practices for the current PO based workflow ? I'd really like to know myself how people translate with the current workflow as I feel we're missing something. In the Italian community we're currently translating most of our files directly on Pootle which may be considered a good translation workflow management system but a very poor translation editor. So far we've tried different solutions: - we downloaded the PO files and tried to translate them with OmegaT but we had problems with the TMX matching and with the reconversion to SDF (gsicheck errors); That is correct. The PO and the TMX do not match so the translators must be extra careful when re-using contents from the TMX, basically that means adding manually all the extra \ that PO has added. - we extracted XLIFF files from Pootle and tried to translate them with the OLT Editor but the tool didn't even open them as it considered the XLIFF files not well formed; No comment here. - we converted the PO files using the OLT Filters, it worked, but then it proved so slow in handling the TM that we had to give up on that; Here, the idea would be to have the OLT filters directly handle the SDF format, but I fear that would not change much for the overall performance. Unless the TMX files were trimmed down a little bit maybe. Like having separate TMX files per module (which would shrink them to the ~k segments each I suppose, instead of the 50k+20k chunks that we have now). - we translated some of our content with poEdit but that editor is as poor as Pootle from this point of view (no TM and no glossary). That is correct. I have tried to install Kbabel on OSX yesterday and I see that it has limited TMX support, but had no time to check further. Plus, the TMX contents and the PO contents not matching we would have problems similar to work with OmegaT I suppose. So far I find the method for translating SDF files proposed by Jean- Cristophe the best way to work on the translation but it seems to be not compatible with Pootle which we are using as well. What we, as translators, really need is a method to translate effectively using TM and glossaries just like we do in the professional world. OmegaT would have it all: a glossary extracted from SunGloss can easily be converted for the tool and the OmegaT TM engine works very well... but then, obviously, we need a TM that matches the content to be translated. Which is why the solution I proposed based on SDF is the best in my opinion. Regarding Pootle, it is possible to upload the result after the translation is completed ? If yes, you could translate based on SDF, convert the result with oo2po and upload that to Pootle to ensure your data is properly managed there ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] PO and TMX contents do not match, help !
On 26 déc. 07, at 17:08, Yury Tarasievich wrote: On Wed, 26 Dec 2007 09:56:57 +0200, Alessandro Cattelan [EMAIL PROTECTED] wrote: ... translators, really need is a method to translate effectively using TM and glossaries just like we do in the professional world. OmegaT would have it all: a glossary extracted from SunGloss can easily be converted for the tool and the OmegaT TM engine works very well... but then, obviously, we need a TM that matches the content to be translated. Maybe I'm missing something, but how can the Sun's glossary/TMX or whatever be helpful without meta-information? No amount of toolchain change is going to address this by itself. I think you are indeed missing something. As Ale wrote, such meta-information can be added to the glossaries (in OmegaT-use the third column) or to TMX files, or to XLIFF files. TMX files can use the note place holder. XLIFF files can use the context place holder. Besides, glossary or TMX information in OmegaT (or anywhere else) is suggestions for the translator at best and the context can be provided by other means. Other means include but is not limited to meta-information. Besides, it is necessary for the meta-information to be directly available and processable by the translator to have any practical use. The focus on meta-information is valid as long as the data is automatically available to the processes. Currently it is not the case, or is it ? Since there are not tools that can automatically process the SDF meta- information in its current form, focusing on meta-information seem to me to be counter productive. Other ways to support the translator is to provide external context to strings. That can be done by the translator's experience itself (knowing the data set, having experience in the field etc), or by providing the data in external viewers: OOo's help viewer, screenshots etc... Maybe *I'm* not making myself intelligible? I'm talking about having things assigned to the strings like a term variant, type of use (menu/option/...), keep short etc. Currently such info often has to be deduced from string ID, or lucky probe in the UI, even from sources digging. Yes. That is correct. But in most of the cases the translator has enough common sense and external resources (the l10n community, experiences users, external context, etc.) to make do for the lack of meta-information or the lack of automatic access to it. I fully understand that you want to provide the most error-prone-less possible workflow by using such meta-information, but in most cases this meta-information will not be available to the translators in a practical way. Last but not least, such meta-information is mostly useful for indetifying UI items, but for the whole rest of the translation process (terminology management, style management etc) it is simply useless. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] PO and TMX contents do not match, help !
On 26 déc. 07, at 17:45, Yury Tarasievich wrote: Could being the operative word here. See, I don't understand where do you expect this info to actually come *from*. Somebody has to type in those thousands of meta-descriptors into the carrier file, after all. Yuri, your original question was: Maybe I'm missing something, but how can the Sun's glossary/TMX or whatever be helpful without meta-information? No amount of toolchain change is going to address this by itself. The answer is simple. In the case of SUN GLOSS, and for an OmegaT centered process, you can leave the meta-information that SUN provides in its data as comments in the glossary file that OmegaT uses. When I write you can I mean it is trivial and can be done in a Calc sheet for example. In the case of TMX/XLIFF, it can be done by properly using the relevant tags in the respective files. And that can be done with a script in the language of your choice. But for that, there is a need to have the _will_ to have a direct filter for the SDF format first. It might as easily be done with the extended SDF/FDS/whatever as with XLIFF, but resources ought to be dedicated beforehand. And so, in the case of hypothetical format switch resources ought to be dedicated twice. That's why I strongly doubt the format switch at this juncture would facilitate the filling of the meta-info slots. As Javier put it, SDF is _not_ a localization format. That is what you seem to not understand in what I wrote. We need a localization format (PO, XLIFF, key=value, anything) that matches the localization data SUN provides us with (TMX). This has nothing to with with developing or not developing the SDF format. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] PO and TMX contents do not match, help !
On 26 déc. 07, at 17:51, Alessandro Cattelan wrote: Jean-Christophe Helary ha scritto: What are the practical benefits related to using Pootle ? Basically, I see two main benefits: - it let's you assign files to translators so that you know who's translating a given file; - it provides some statistics so that you know at a glance how many files or words need to be translated. Ok, so the problem is that the current PO files, as provided by SUN using the oo2po convertion do not match the TMX contents so you can't work properly with them, right ? So, if we could have PO files that match the TMX contents, we could use Pootle to do the file management and a different tool to do the translation itself. Is that correct ? Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Thoughts on Localization
On 24 déc. 07, at 22:27, Javier SOLA wrote: Two ideas for the discussion, - SDF is not a localization format. Nobody localizes using SDF. It is just an intermediary format that has the available information, and which simplifies the steps of gathering the necessary information and putting it back in the source. I do use SDF because its contents matches the TMX provided by SUN. And I am not the only one. - Localization formats that we are using are PO and XLIFF. I don't use (and don't advise to use) the current implementation of the oo2po tool which produces the PO files a lot of people are using, because the contents produced do not match the contents of the TMX provided by SUN. And I am not the only one. The current PO files create a huge overhead for translators, who need to play with \ characters so that their work is properly validated. This comes from PO over-escaping strings that are alsread escaped in SDF. We are working on new SDF-XLIFF, XLIFF-SDF and XLIFFUPGRADE filters that we hope to finish soon. The filters will be integrated in the upcoming version 0.5 of the WordForge off-line localization editor. If what you do is compatible with the TMX contents that is great. Now if the final idea is to use and XLIFF workflow, the best would be to have SDF contain the original XML and _not_ escaped strings that have no meaning when used in processes that include XLIFF or TMX. XLIFF and TMX have all the necessary functions to protect the XML of the translatable strings. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Thoughts on Localization
On 24 déc. 07, at 21:34, Yury Tarasievich wrote: On Mon, 24 Dec 2007 11:29:59 +0200, Jean-Christophe Helary [EMAIL PROTECTED] wrote: Why has SUN moved from a workflow that ensures the most efficient use of previous translations to a workflow that does not ? Because it's not their priority? Because at the time this seemed (or actually was) the best available solution? Because the world-wide populariry of the OOO translating caught them unaware? Because the communities were unaware of the existence of the tools and were drawn by a PO centered workflow that is mainstream in other FOSS communities. Nothing else. How hard would it be to have a few Java programmers improve the current OLT filters so that SDF is supported there ? The OLT itself seems to be sort of put on ice, as it seems. Or so I gathered in Spring, when accessing the possibity to XLIFF-migrate the OOO translation I'm taking care of. OLT the editor does not need to be modified. Only the small utility that is the OLT Filters needs to add SDF support. And that would only to provide yet another way to translate, with a professional tool. How hard would it be to give translators access to the full source of the help files for context ? Ie, what can be done in practical terms, besides for PO hacks, to improve the translators' work and the output quality ? I.e., to add meta-information, be it Nth extra field of SDF or whatever carrier format, which would be an enterprise all by itself. The PO hacks, ugly as they are, work, and translations are coming in. The new way to go, pretty as it may seem in theory, has yet to be implemented *and* to prove itself. What about *other* translating teams, after all? It is not pretty in theory, it already works. I've documented how to use SDF directly to get direct matches from the TMX in tools that are developped for translation work. Here is the link in case you are interested: http://mac4translators.blogspot.com/2007/12/openofficeorg-localization-and-easy-way.html oo2po has failed to produce files that match the TMX data that SUN is providing the community with and even though PO based translations keep coming, using the current PO process _does not contribute to make the translation workflow easier_. But obviously, for communities that only know the current PO hacks such an assertion does not mean anything... Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Thoughts on Localization
On 24 déc. 07, at 23:13, Yury Tarasievich wrote: Propose and implement what you wish, you still omit people in translating communities needing to re-learn etc. Yes, re-learn to be more efficient. Anything wrong with that ? See what is wrong with the current process and try to improve it. As I wrote before, using the TMX in OmegaT with the SDF files directly allowed me to have about 50% of the GUI strings almost automatically translated, 25% had very close matches and the remaining 25% were new strings. So I am saying that by spending 15mn to read what I wrote, another 15mn to read the software tutorial, one can save 75% on the time spent to translate. Take me for example — I get quite a fair re-use ratio with Kbabel, I don't feel comfortable with the feel of any free XLIFF-capable tools, and I have yet to see some good demonstration of translation- workflow-related advantages of the new way to go. What I saw in my experience with OmegaT wasn't better, it was better some, worse some. It was not better because until now there was not way to easily use the contents of the SDF file. It was not easier to me neither even though I use OmegaT professionally. It was not easier because the files that are served to us are not properly converted and don't match the TMX contents. Can't recall — did I say I'm opting out of this discussion already? But you never attempted to discuss in the first place. I am not being uselessly critical, I have proposed solutions that work and that allow translators and coordinators to be more efficient. And I know that because I have tried them, and I have completed and delivered translations with them. Have you even tried the methods before ranting ? Of course not, otherwise you would have seen that what I wrote _did_ make sense. Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Thoughts on Localization
On 25 déc. 07, at 05:12, Ain Vagula wrote: I agree with you I VERY agree with you What is the point VERY agreeing with a useless rant that does not propose anything but criticize people who try to propose new and working solutions ? Charles - knowing something about translation process technical details isn't surely mandatory for NL project lead, but there's no need to talk about things you aren't familar with. Every your positive or negative opinion can influence people because of your position as project lead. Please think twice next time. Citation: May this be read and used by as many localisers as possible! Obviously, it was not the _contents_ of the article that prompted this discussion. Fortunately I was away from computer this evening when this message hit my mailbox and also fortunately Pavel responded to this. More gently I could ever do. Ok, and what is your take on the fact that the TMX contents and the SDF contents match but not the PO ? How do you think it is possible to improve that so that translators can make better use of previously translated strings ? Because _that_ is what is being discussed... Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Thoughts on Localization
(Boris or anybody else, I Cc to OLT's dev list because that is relevant here. How hard do you think it would be to have OLT's filters work with OOo's l10n format: .sdf ?) On 23 déc. 07, at 18:00, Yury Tarasievich wrote: On Sun, 23 Dec 2007 05:59:05 +0200, Jean-Christophe Helary [EMAIL PROTECTED] wrote: XLIFF files directly generated from the XML for the help files and from the rest for the UI. Here I should point out that preciously little tools work currently with the 1.1 XLIFF (even Sun's OLT didn't process 1.1 in Spring 2007). Anyway, such change, while disruptive to many, would also be at least useless, if with it won't come extended meta-information coverage (string context etc.) So, possibly the real answer is extend coverage of meta- information, extending the SDF format appropriately. Yuri, I don't want to sound rude but you have it all wrong. I don't care about the SDF format. What I need and what translators need is a tool chain that does not _break_ the data as it does now with oo2po. What translators need is tools that support the format they use with the least transformation possible. Open Language Tools' problem is not that is does not support XLIFF 1.1, its problem is that its filter does not support SDF... Besides, if OLT does not support XLIFF 1.1 why not feed it with XLIFF 1.0 ? Have you thought about that ? All the source files are in plain XML, have you thought about the efficiency loss of converting an XML file to a non XML format in terms of checks ? Have you considered the efficiency gains of keeping a strictly XML based tool chain for the localization ? Pavel says that SDF is so easy to transform, then, what about OOo l10n developers took some time to improve OLT's filters so that we can use them directly with SDF ? _That_ would considerably improve the translation process. To a point that OLT could actually be used with the TMX by any community that wants to work with a professional grade tool... Any taker ? -- Jean-Christophe Helary http://mac4translators.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] _Easy_ way to translate the SDF files with the TMX memories ...
-more than half of the remaining segments had a very close equivalent in the TMX -the rest was about 60~70 segments _out of 400_ When the translation coordinator receives all the translated files, they are merged in the original SDF file and put to the issues tracker. Also, the current CVS version of OmegaT includes Hunspell. You can use OOo's dictionaries directly with it. You just need ant to build OmegaT. I hope this post will contribute to ease the OOo localization process ! And I would like to thank Alex for the numerous test versions he produced before _I_ was satisfied with sdf2txt ! Don't hesitate to ask questions if you have any ! Jean-Christophe Helary == ==sdf2txt.jar is a Java utility. http://alex73.zaval.org/snapshots/OpenOffice/sdf2txt.jar == ==OmegaT is a Java Computer Aided Translation tool. (Version 1.7.3) http://sourceforge.net/project/showfiles.php?group_id=68187package_id=214253 (Version 1.8, CVS, with Hunspell) cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/ omegat co -P omegat to build, enter the /omegat/ folder and type ant the dictionary setup is relatively straightforward. == - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] SDF converted to text for translation
Alex Buloichik has created a small command line utility to export all the source text contents to a plain text file. For example: (from OpenOffice.org-SRC680_m234-POT.tar.gz) == accessibility source\helper\accessiblestrings.src 0 string RID_STR_ACC_NAME_BROWSEBUTTON13691 en-US Browse2002-02-02 02:02:02 avmedia source\framework\mediacontrol.src 0 string AVMEDIA_STR_OPEN 13691 en-US Open2002-02-02 02:02:02 avmedia source\framework\mediacontrol.src 0 string AVMEDIA_STR_INSERT13691 en-US Apply2002-02-02 02:02:02 avmedia source\framework\mediacontrol.src 0 string AVMEDIA_STR_PLAY 13691 en-US Play2002-02-02 02:02:02 avmedia source\framework\mediacontrol.src 0 string AVMEDIA_STR_PAUSE13691 en-US Pause2002-02-02 02:02:02 == Is exported to: == Browse Ouvrir Apply Play Pause == It takes less than a second to export the full 70.000 strings. The text file can now be translated in any tool and not specifically PO editors. Since the source is a text file equivalent to the contents of the sdf file, the TMX that Rafaella will match their contents much better than with a PO file. The ideal workflow would be the following: 1) export the translatable contents of the 2.4 strings sdf files 2) use the resulting file as source in OmegaT 3) use the TMX as reference TMX files 4) use SUN GLOSS as reference glossary 5) translate all that in OmegaT to benefit from the automatic TMX matching functions 6) import the translated contents into the original 7) deliver a valid SDF file. Anybody willing to test that before the translation really starts ? Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TMX files (List of UI strings/translation)
On 28 nov. 07, at 19:55, Rafaella Braconi wrote: Can anyone generate csv file from these tmx ? I need the list of User Interface translation to use as the glossary for OmegaT. why don't you use the glossary available in Sun Gloss? Is there a real equivalence between the UI files and Sun Gloss ? I think the point is to get only the UI strings, not the rest of the glossary. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TMX files (List of UI strings/translation)
On 28 nov. 07, at 19:21, Reiko Saito wrote: Hi JC, I understood your point. To increase the leverage, we may be able to lower the lowest match- rate, or will use this tmx just for reference and search a certain string as a file... what do you think ? It is possible to use the sdf directly and not the PO file as source. For that it is necessary to do a few things to the sdf file as I described sometimes in the summer if I remember well. Simply put, it amounts to that: Basically the sdf comes as pairs of lines: string in English+meta data string in target language+meta data The format for each line is very close to CSV so the idea is to: 1) convert each pairs to: string in English+meta data string in target language+meta data 2) import that into OOo and to select the columns that correspond to string in English. 3) put that into a text file for use as source in OmegaT or anything else with the TMX provided by Rafaella as reference. All this is done with a text editor and a few regex search/replace. Once the translation is completed, it is pasted into the string in target part of the CSV file, the file is converted back to a 2 lines format and the result is delivered. Can anyone generate csv file from these tmx ? It is very easy but now that I am seeing the contents, I wonder it is is a good idea. Look at one tuv: tu tuid=47197 tuv xml:lang=en-US segThe query already exists. Do you want to delete it?/seg /tuv tuv xml:lang=ja-JP segこのクエリはすでに存在します。削除します か。/seg /tuv /tu Obviously this is a full sentence and not a menu item. So maybe Rafaella's idea of using Sun Gloss for glossary reference is better after all ? Anyway, just in case we need the conversion it is not a very complex task to do that by hand with a few regex. I'll do it for the Japanese group if you want. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] List of UI strings/translation
On 27 nov. 07, at 22:49, Rafaella Braconi wrote: Hi Reiko, Reiko Saito ha scritto: Hi Rafaella, Exported stgings include English and translation, right ? yes, that's correct. Rafaella, Would it be possible to have them exported to whatever format you like before we get the files to translate ? So that we can convert them to the required formats ? I know that the Japanese project will make use of them, and personally, as a participant to the French translation, I'd love to have them there too. Is there a way to get that automatically or do we have to go through you ? Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] TMX files (List of UI strings/translation)
On 28 nov. 07, at 02:19, Rafaella Braconi wrote: So far I have created tmx files and made them available at:http://wiki.services.openoffice.org/wiki/Translation_for_2.4#Translation_Memories But if for your work you would like to get sdf (at least as long as French is not available in Pootle) just let me know. Very good ! Thank you Rafaella. For other list members' information, the TMX files include the following languages: TMX_2007-09-12_de.zip12-Sep-2007 13:20 3.2M TMX_2007-09-12_es.zip12-Sep-2007 13:20 3.1M TMX_2007-09-12_fr.zip12-Sep-2007 13:20 3.1M TMX_2007-09-12_hu.zip12-Sep-2007 13:20 3.2M TMX_2007-09-12_it.zip12-Sep-2007 13:20 3.1M TMX_2007-09-12_ja.zip12-Sep-2007 13:20 3.3M TMX_2007-09-12_ko.zip12-Sep-2007 13:20 3.1M TMX_2007-09-12_nl.zip12-Sep-2007 13:20 3.1M TMX_2007-09-12_pl.zip12-Sep-2007 13:20 3.2M TMX_2007-09-12_pt-BR.zip 12-Sep-2007 13:20 3.1M TMX_2007-09-12_pt.zip12-Sep-2007 13:20 3.1M TMX_2007-09-12_ru.zip12-Sep-2007 13:20 3.2M TMX_2007-09-12_sv.zip12-Sep-2007 13:20 3.0M TMX_2007-09-12_zh-CN.zip 12-Sep-2007 13:20 3.1M TMX_2007-09-12_zh-TW.zip 12-Sep-2007 13:20 3.1M I've just checked the fr and ja packages and they include about 44,000 translation units for the Help and about 25,000 TUs for the UI, in our case such units are basically Help files paragraphs or UI items. They seem to be taken not directly from the XML files that constitute the Help files but from the post-processed sdf files. All this means: 1) The TMXs contain all the XML code escaped with \ as per the sdf file: they are not proper TMX level2 files 2) Since they conform to the sdf contents they can be used directly to translate it (either in OpenLanguageTools or OmegaT) 3) _But_ since the original XML code is also contained in the translation unit itself (instead of being encapsulated in TMX tags) there are chances that the matches will be influenced by the XML code instead of reflecting the translatable contents. Not only will that lower the frequency of relevant matches but that will add to the burden of the translator since that requires editting the escaped XML code to get a proper match (which would be automatic with proper encapsulation). 4) people who use sentence segmentation in their tools should disable it and work with paragraph segmentation on so as to get the best possible matches from the TMXs. Ideally, it would be preferable to translate directly from the XML (not from sdf) and to have the XML code properly encapsulated within the TMX to provide translators with the best matches possible and the easiest way to recycle the XML code. If XLIFF is considered as a prefered format in the future (to replace sdf), I think that would be important to take into account proper encapsulation of the original XML. I don't have much time right now, but if people are interested I could make a demonstration to show how much easier it would be for translators to have a proper localization format with proper TMX files. Anyway, only the fact that real TMX files are available is a big plus compared to the times when none were available. Thank you very much Rafaella for your efforts. I hope we will be able to make good use of the data, as well as to propose better workflows in the future. Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] List of UI strings/translation
On 26 nov. 07, at 23:41, Rafaella Braconi wrote: Hi Jean-Chrisophe, an .sdf file can be opened and saved as CVS document... Good, as Reiko just wrote, such contents would be interesting only it it includes the English and the translation. Can you confirm it is the case ? Rafaella P.S. Nice to see you back on the list! I thought my hibernation would last longer but global warming seems to affect Japan more than what I expected ;) Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] List of UI strings/translation
On 26 nov. 07, at 15:17, Reiko Saito wrote: Can the translator get the list of existing UI messages with translation ? The list formatted as csv, like --- Expand Tree: ツリーを展開 Contact Tree: ツリーを収縮 If those files are available, any Translation Memory Editor, e.g. OmegaT, can read the file and present them as the glossary on the tool. Rafaella, I think you mentioned a while ago (for 2.3 ?) that it was not possible to have real TMX of already existing translations. Is it still the case ? When the contents are totally new, they are not as useful but when there are incremental additions to old contents such reference files can be extremely useful and greatly easy the translation process. TMX is not required. As Reiko says, anything like CSV/TSV could be later converted to the required formats. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: [qa-dev] switching to XLIFF
On Aug 12, 2007, at 12:00 PM, Javier SOLA wrote: OpenOffice has multiple variable formats, and it is nice that the program recognises them as units of text that need to be replicated exactly at the target. XLIFF uses for this the mrk in-line tag. The introduction of the tags must me done by the filters SDF to XLIFF. Do you mean that the SDF-XLIFF filter will correctly encapsulate the XML code that SDF escapes ? If yes that is great news! In case that is what you intend to do, you should be aware that mrk is _not_ the tag to do that: http://www.oasis-open.org/committees/xliff/documents/xliff- specification.htm#mrk quote Marker - The mrk element delimits a section of text that has special meaning, such as a terminological unit, a proper name, an item that should not be modified, etc. It can be used for various processing tasks. For example, to indicate to a Machine Translation tool proper names that should not be translated; for terminology verification, to mark suspect expressions after a grammar checking. The mrk element is usually not generated by the extraction tool and it is not part of the tags used to merge the XLIFF file back into its original format. /quote mrk has _nothing_ to do with encapsulation of non translatable _code_, and as is indicated in the quote, it is _not_ generated by the extraction tool (or the filter) etc... If you want to encapsulate the SDF code properly you need to use: bpt and ept for code pairs, it for isolated code and sub for translatable subflows within the code. http://www.oasis-open.org/committees/xliff/documents/xliff- specification.htm#bpt If your segmentation process (if you have any) puts bpt/ept series in different source segments then it is sometimes considered safer to use ph (place holders) series instead. http://www.oasis-open.org/committees/xliff/documents/xliff- specification.htm#ph But mrk is certainly not the tag to use for non translatable code. Translation memory must learn to deal with tags. Translation memories have not waited for the translate-toolkit to deal with tags. Most translation memory tools already deal with TMX level 2 to various levels. So maybe you mean translate-toolkit must learn to deal with tags ? Independently of which tools are being used, I am glad to hear agreement on the fact that the future of OOo localization is XLIFF. And it would be even better if the XLIFF could be provided directly from the original XML and not after a conversion to SDF, which renders the whole process uselessly complex. Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: [qa-dev] switching to XLIFF
On Aug 11, 2007, at 8:02 PM, Clytie Siddall wrote: So I'm not talking about converting between SDF, PO and XLIFF, or between any combination of the three. I'm talking about using XLIFF as the base translation format for OpenOffice.org. This is what I suggested in a mail here at the beginning of July. But currently all the XLIFF conversion have to go through the SDF-PO thing first. The original help files are in XML so converting them directly to XLIFF should be way easier (for the translator but also for the processors as well) than going through the current SDF-PO thing. People who still want to do PO will be able to do so with the conversions from XLIFF. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
On 12 juil. 07, at 20:29, Jean-Christophe Helary wrote: On 12 juil. 07, at 17:36, Rafaella Braconi wrote: However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Yes, because the problem is not the delivery format, it is the fact that you have 2 conversions from the HTML to the final format and the conversion processes are not clean. Similarly, the TMX you produce are not real TMX (at least not the one you sent me). I am not arguing that UI files would benefit from such treatment. I am really focusing on the HTML documentation. To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. If translators want to use intermediary formats to translate HTML in their favorite tool (be it PO, XLIFF or anything else) that is their business. Janice (NetBeans) confirmed me that NB was considering a Pootle server exclusively for UI files (currently Java properties files), but in the end that would mean overhead anyway since the current process takes the Java properties as they are for translation in OmegaT. In NB, the HTML documentation is available in packages corresponding to the modules, and the TMX (a real one...) allows to automatically get only the updated segments. No need for a complex infrastructure to produce differentials of the files, all this is managed by the translation tool automatically and _that_ allows the translator to have _much more_ leverage from the context and to benefit from a much greater choice of correspondances. I suppose the overhead caused by the addition of an intermediary format for the UI files will be balanced by the management functions offered by the new system, but I wish we did not have to go through translating yet another intermediate format for the simple reason that seeing the existing conversion processes (I've tried only the translate-toolkit stuff and it was flawed enough to convince me _not_ to use its output) is likely to break the existing TMX. If the management system were evolved enough to output the same Java properties files I am sure everybody would be happy. But, please, no more conversion than necessary. To go back to the OOo processes, I have no doubt that a powerful management system available to the community is required. But in the end, why is there a need to produce .sdf files ? Why can't we simply have HTML sets, like the NB project, that we'd translate with appropriately formed TMX files in appropriate tools ? My understanding from when I worked with Sun Translation Editor (when we were delivered .xlz files and before STE was released as OLT) is that we had to use XLIFF _because_ the .sdf format was obscure. But in the end, the discussion we are having now after many years of running in circles apparently) revolves not on how to ease the translator's work but on how to ease the management. If the purpose of all this is to increase the translators' output quality, then it would be _much_ better to consider a similar system that uses the HTML sets directly. Because _that_ would allow the translator to spend much more time on checking the translation in commonly available tools (a web browser...) How do you do checks on PO/XLIFF/SDF without resorting to hacks ? Keeping things simple _is_ the way to go. Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] TMX/XLIFF output (Re: [l10n-dev] Imagine :))
On 13 juil. 07, at 04:45, Rafaella Braconi wrote: No, but that means that correct TMX files are a possibility (even now). By the way I wonder why Rafaella told me creating TMXs of the state of the strings before the current updates was impossible ? to clarify: the only possibility I have is to provide you TMX files in which translation exactly matches the English text now. If the English source has been changed I have following situation: New English text - Old translation (matching previous text). In the database I have no possibility to provide you with files containing Old English text and Updated English text. Don't you have a snapshot of the doc _before_ it is modified ? I mean, I have the 2.2.1 help files on my machine, so I can use the XML files in, for ex, sbasic.jar in the EN folder and align them with the same files in the FR folder and create a valid TMX of the state of the 2.2.1 version. This is what I suggest you keep somewhere, for each language pair (with EN in source). So you have a static set of TMX, archived by module (sbasic, swriter, etc) for each language, available from the community web, and translators just get the TMX they need for their current assignment. Such files don't need to be dynamically generate,d they are valid for the most recent stable release, once the release is updated the files can be output for the translation of the next version. So, create the TMX _before_ you modify the data base, _or_ from static files that exist anyway inside any copy of OOo. And create TMX level2 files, with all the original XML encapsulated so as not to confuse CAT tools and translators. Regarding the output of proper source files, now that we (I...) know that the original is in XML, it should be trivial to provide them either directly as XML sets (specifically _without_ outputting diffs), or as XML diffs, or as XLIFFs. You may have some technical requirements that have you produce SDF files, but those only add an extra layer of complexity to the translation process and I am sure you could have a clean XML output that includes all the SDF contained meta info, so that the source file _is_ some kind of XML and not an hybrid that considers XML as text (which is the major source of confusion). If you have an XML workflow from the beginning, it should be much safer to keep it XML all the way hence: original = XML (the OOo dialect) diffs = XML (currently SDF, so shift to a dialect that uses the SDF info as attributes in XML diffs tags for ex) source = XML (XLIFF) reference = XML (TMX, taken from the original) TMX is not supported by most PO editors anyway, so a clean TMX would mostly benefit people who use appropriate translation tools (free ones included). Regarding the XLIFF (or PO, depending on the communities I gather) source output, each community (and even each contributor) could use the output that fits the tools in use. XLIFF should be 1.0 so as to ensure OLT can be used (OLT does not support more recent versions of XLIFF sadly). And then you have a clean workflow that satisfies everybody, and the management (Pootle) system can be put on all that to provide communities with the best environment possible. And of course, this workflow is also valid for UI strings, since I suppose they can also be converted to XML (if they are not already). What about that ? Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT UTF-8 problem
On 13 juil. 07, at 10:42, ChengLin wrote: HI, We're trying to use OmegaT in Simplified Chinese Windows XP, it can't save to UTF-8 but Chinese GBK. Could anyone help us? You go to Options/File Filter, select the file format you are using and edit the output encoding. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Translating .sdf files directly with OmegaT
On 13 juil. 07, at 02:40, Alessandro Cattelan wrote: Actually I haven't tried going through the procedure you described, I think I'll give it a try with the next batch of files. We'll have around 4,200 words to translate and as it is a reasonable volume, I think I'll have some time to spend in testing a new procedure. What I fear, though, is that OmegaT would become extremely slow processing a huge SDF file. If I have a bunch of PO files I can just import only a few of them into the OmT project at a time and that makes it possible to translate without too much CPU sweat :o). When I tried loading the whole OLH project on which we worked in June, my computer was almost collapsing: it took me over an hour just to load the project! I don't have a powerful machine (AMD Athlon XP, 1500Mhz, 700MB RAM) but I think that if you have a big TM it is not wise to load a project with over a thousand segments. You are definitely right here: the bigger the TMX the more memory it takes. Which is the reason why I just suggested (in the Imagine thread) that we have TMX by modules. Also, you can assign OmegaT more memory that you actually have on your machine, I use OmegaT like this: java -server -Xmx2048M -jar OmegaT.jar The -server option makes it faster too. The sdf files we have are not that big though. So you have to be selective with the TMX you use. Maybe we could split the SDF file into smaller ones, but I'm not sure that would work. If you try my method, you can translate bits by bits. There are no problems with that. What matters is that the reverse conversion is properly made. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] For the communities that want to try OmegaT...
The updated online manuals are indexed here: http://sourceforge.net/docman/display_doc.php?docid=61937group_id=68187 They are included in the test version available here: http://sourceforge.net/project/showfiles.php?group_id=68187 (use the Other - Development | OmegaT 1.7.1 download) And don't forget to read the explanations I sent on the 7th. Besides for that, it is possible to use OmegaT without hacking anything if you just have plain ODF or HTML source file sets for your OOo related documentation. No need to convert anything to PO. Just follow the quick tutorial that displays at launch and you'll be translating in 15 mn. With a tool that professional translators use everyday... Jean-Christophe Helary (fr) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Translating .sdf files directly with OmegaT
On 11 juil. 07, at 15:29, Arthur Buijs wrote: The overhead of using po-files in the translation process is minimal (exept from the initial trying out). It is not when you have to modify the tagged links to fit the source. In OmegaT that is done automatically without you even noticing it. Also, all the emph tags, if they need to be displaced or edited, require more work in a text based editor than in OmegaT (if done the way I suggested). Of course, using the PO files in a PO editor or in OmegaT will not make much difference in terms of editing the matches. The problem _is_ which source file you choose to work with and what relation they have to the original format (here: HTML-SDF-PO, almost no relation anymore when you reach the PO stage.) So I am really talking about not using PO because _that_ requires to handle the files at text, while using the modified .sdf allows them to be handled as HTML (which does considerably reduce the amount of editing). Ofcourse this is only true if a useable tmx-file is available. My advise would be to find a better way to generate tmx-files and use po-files for the translation-task. The TMXs provided by Rafaella were similar to the ones provided by the translate-toolkit processes (oo2po - po2tmx) and neither corresponded to the source po file in terms of number of \ characters for the escape sequences. They corresponded to the original .sdf file, which is what originally prompted me to use the original .sdf file as source. The rest of the hack I proposed on the 7/7 comes from that. The general problem does not only come from the TMX, but from the fact that .sdf is already an intermediate format (that you then convert to yet another intermediate format - po). The original conversion requires escapes and _that_ is what requires the files to be handled as text when they could just as well be handled as pure and simple HTML which most translation tools support. The TMX problem is yet another problem. Here, we have the following structure for the TMXs: (new source segment) (old target translation, if present) A _real_ TMX should be: (old source segment) (old target translation) So the current process is very confusing and does not allow TMX supporting tools (like OmegaT or even OLT) to fully leverage the contents of the source. Which is the real function of the TMX file. Plus, the fact that the TMX do not reflect the structure of the actual source file (PO) makes them yet another problem. Of course, I am commenting on the process only with the perspective of allowing translation contributors to have access to a translation workflow that supports the use of computer aided translation tools. Right now the process that is suggested by the file formats available for OOo's localization does not facilitate this at all. Another of SUN's project, namely NetBeans, manages to fully leverage legacy translations thanks to the use of simple source file formats (the UI files are simple Java properties and the Help files are simply HTML) and the whole source files are matched to the legacy translations output to TMX for super easy translation (in OmegaT or any other TMX supporting tool, even though OmegaT is the most used tool there). As long as OOo sticks to intermediate file formats (.sdf/.po/.xliff) with the current unstable conversion processes, hack will be necessary to reach the same level of efficiency other communities have already reached. And _that_ is really too bad. Cheers, Jean-Christophe Helary (fr) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Imagine :)
I have no idea where the UI files come from and how they _must_ be processed before reaching the state of l10n source files. So, let me give a very simplified view of the Help files preparation for l10n as seen from a pure TMX+TMX supporting tool point of view. Since I don't know what the internal processes really are I can only guess and I may be mistaken. • The original Help files are English HTML file sets. • Each localization has a set of files that corresponds to the English HTML sets • The English and localized versions are sync'ed To create TMX files: Use a process that aligns each block level tag in the English set to the corresponding block level tag in the localized set. That is called paragraph (or block) segmentation and that what SUN does for NetBeans: no intermediary file format, no .sdf, no .po, no whatever between the Help sets and the TMX sets. The newly updated English Help files come as sets of files, all HTML. The process to translate, after the original TMX conversion above (only _ONE_ conversion in the whole process) is the following: Load the source file sets and the TMX sets in the tool. The HTML tags are automatically handled by the tool. The already translated segments are automatically translated by the tool. The translator only needs to focus on what has been updated. Using the whole translation memory as reference. Once the translation is done, the translator delivers the full set that is integrated in the release after proofreading etc. What is required from the source files provided side ? Creating TMX from HTML paragraph sets. What is required from the translator ? No conversion whatsoever, just work with the files and automatically update the translation with the legacy data. Now, what do we have currently ? The source files provider creates a differential of the new vs the old HTML set. It converts the result to an intermediate format (.sdf) It converts that result to yet another intermediate format for the translator (either .po or xliff) It matches the results of the diff strings to corresponding old localized strings, thus removing the real context of the old string It creates a false TMX based on an already intermediate format, without hiding the internal codes (no TMX level 2, all the tag info is handled as text data...) The translator is left to use intermediate files that have been converted twice, removing most relation to the original format and adding the probability of having problems with the back conversion. It has to work with a false TMX that has none of the original context, thus producing false matches that need to be guessed backward and that displays internal codes as text data. Do you see where the overhead is ? It is very possible that the UI files do require some sort of intermediate conversion to provide the translators with a manageable set of files, but as far as the Help files are concerned (and as far as I understand the process at hand) there is absolutely no need whatsoever to use an intermediate conversion, to remove the original context and to force the translator to use error prone source files. It is important to find ways to simplify the system so that more people can contribute, so that the source files provider has less tasks to handle, but clearly using a .po based process to translate HTML files is going totally the opposite way. And translators are (sadly without being conscious of that) suffering from that, which results into less time spend on checking one's translation and a general overhead for checkers and converters. Don't get me wrong, I am not ranting or anything, I _am_ really trying to convince people here that things could (and should) be drastically simplified, and for people who have some time, I encourage you to see how NetBeans manages its localization process. Because we are loosing a _huge_ amount of human resources in the current process. Cheers, Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Translating .sdf files directly with OmegaT
Ale, I was wondering if you eventually had considered this procedure. I works very correctly and considerably increases productivity thanks to OmegaT's HTML handling features. I think I'm going to investigate the possibility of having an .sdf filter for OmegaT rather than having to go through all the po loops that really don't provide much more than yet another intermediate format that is anyway inconvenient to translate. JC On 7 juil. 07, at 00:41, Jean-Christophe Helary wrote: The reason why I tried to do that is because using the .po created with oo2po along with the TMX created with po2tmx does not work well. The po2tmx removes data from escape sequences and that means more things to type in the OmegaT edit window. So, the idea was to consider the .sdf file as a pseudo HTML file to benefit from a few automatic goodies offered by OmegaT: 1) tag reduction (so that one needs to type less when tags are inline) and 2) tag protection (for block tags like the ahelp.../ahelp when they open and close the segment) if the TMX could be hacked to show formatting tags similar to the modified source file it would become trivial to edit the tags and reflect the new contents found in source. Problem is, an .sdf file is not a HTML file: there is plenty of meta information and a lot of escaped , and others. Also, a .sdf file seems to be constituted of 2 lines blocks: the source line and the target line. The first problem will be solved later, now, to extract the translatable contents we need to change the 2 lines blocks into one line blocks with source and target data next to each other. This is does using a regexp like (those are not exact, I do them from memory plus they may change depending on the editor you chose): search for: ^(.*)(en-US)(.*)\r^(.*)(fr)(.*) replace with: \1\2\3\t\4\5\6 Now that your .sdf is linearized, change its name to .csv and open it in OpenOffice by using tab as field separator and nothing as text delimiter. The tabs in the original .sdf create a number of columns from where you just need to copy the column with the en-US translatable contents. Paste that into a text file that you'll name to .html Now, we need to convert this to pseudo HTML. The idea being that OmegaT will smoothly handle all the ahelp etc tags that will be found there. First of all, we need to understand that not all the are tag beginning characters, a number of them are simply inferior characters. So we grab those first: search for: ([^\]) replace with: \1lt; are less of a problem but let's do them anyway: search for: ([^\]) replace with: \1gt; Now we can safely assume that all the remaining or are escaped with \ and to correct that (so that the non escaped tags can be recognized in OmegaT) do: search for: \\ replace with: search for: \\ replace with: Last but not least, to ensure that OmegaT will consider each line as being a segment we need to add the paragraph mark to each line beginning: search for: ^ replace with: p Save, the file should be ready to be processed. Now, we need to get matches from the TMX files that either we have created (oo2po - po2tmx) or that Rafaella all have provided us with. Problem is that the TMX files reflect the contents of the .sdf that we have just modified. In the TMX, we are likely to find an ahelp tag written as \ahelp something\ which will not be helpful since in OmegaT the ahelp tag will be displayed as a0 and thus will not match the \ahelp something\ string. So, we need to hack the file so that it looks close enough to what the source expects... In the TMX we want to reduce _all_ the escaped tags to a short expression that looks like a for a tag starting with a. So we would do something like (here again, not 100% exact regexp). search for: \\(.)[^]* replace with: lt;\1gt; same for tail tags: \\/(.)[^]* replace with: lt;/\1gt; If I remember well everything I did in the last few days that is about it. Save the TMX, put it in /tm/, load the project and translate... You can also put the Sun glossaries in /glossary/ after a little bit of formatting. But that too is trivial. When translation is done, it is important to verify the tags (Tool - Valitate tags) click on each segment where the tags don't with source and correct the target. Then Project - Create translated files Get the translated .html file from /target/ And now we need to back process the whole thing to revert it to its original .sdf form. 1) remove all the p at the beginning of the lines 2) replace all the with \, all the with \, all the lt; with and the gt; with This should be enough. Now copy the whole file and paste it in the target contents part of the still opened .csv file. The .csv file now contains the source part and the target part next to each other. Let's save this (be careful: tab as field separator and nothing as text delimiter). Open
[l10n-dev] Translating .sdf files directly with OmegaT
The reason why I tried to do that is because using the .po created with oo2po along with the TMX created with po2tmx does not work well. The po2tmx removes data from escape sequences and that means more things to type in the OmegaT edit window. So, the idea was to consider the .sdf file as a pseudo HTML file to benefit from a few automatic goodies offered by OmegaT: 1) tag reduction (so that one needs to type less when tags are inline) and 2) tag protection (for block tags like the ahelp.../ahelp when they open and close the segment) if the TMX could be hacked to show formatting tags similar to the modified source file it would become trivial to edit the tags and reflect the new contents found in source. Problem is, an .sdf file is not a HTML file: there is plenty of meta information and a lot of escaped , and others. Also, a .sdf file seems to be constituted of 2 lines blocks: the source line and the target line. The first problem will be solved later, now, to extract the translatable contents we need to change the 2 lines blocks into one line blocks with source and target data next to each other. This is does using a regexp like (those are not exact, I do them from memory plus they may change depending on the editor you chose): search for: ^(.*)(en-US)(.*)\r^(.*)(fr)(.*) replace with: \1\2\3\t\4\5\6 Now that your .sdf is linearized, change its name to .csv and open it in OpenOffice by using tab as field separator and nothing as text delimiter. The tabs in the original .sdf create a number of columns from where you just need to copy the column with the en-US translatable contents. Paste that into a text file that you'll name to .html Now, we need to convert this to pseudo HTML. The idea being that OmegaT will smoothly handle all the ahelp etc tags that will be found there. First of all, we need to understand that not all the are tag beginning characters, a number of them are simply inferior characters. So we grab those first: search for: ([^\]) replace with: \1lt; are less of a problem but let's do them anyway: search for: ([^\]) replace with: \1gt; Now we can safely assume that all the remaining or are escaped with \ and to correct that (so that the non escaped tags can be recognized in OmegaT) do: search for: \\ replace with: search for: \\ replace with: Last but not least, to ensure that OmegaT will consider each line as being a segment we need to add the paragraph mark to each line beginning: search for: ^ replace with: p Save, the file should be ready to be processed. Now, we need to get matches from the TMX files that either we have created (oo2po - po2tmx) or that Rafaella all have provided us with. Problem is that the TMX files reflect the contents of the .sdf that we have just modified. In the TMX, we are likely to find an ahelp tag written as \ahelp something\ which will not be helpful since in OmegaT the ahelp tag will be displayed as a0 and thus will not match the \ahelp something\ string. So, we need to hack the file so that it looks close enough to what the source expects... In the TMX we want to reduce _all_ the escaped tags to a short expression that looks like a for a tag starting with a. So we would do something like (here again, not 100% exact regexp). search for: \\(.)[^]* replace with: lt;\1gt; same for tail tags: \\/(.)[^]* replace with: lt;/\1gt; If I remember well everything I did in the last few days that is about it. Save the TMX, put it in /tm/, load the project and translate... You can also put the Sun glossaries in /glossary/ after a little bit of formatting. But that too is trivial. When translation is done, it is important to verify the tags (Tool - Valitate tags) click on each segment where the tags don't with source and correct the target. Then Project - Create translated files Get the translated .html file from /target/ And now we need to back process the whole thing to revert it to its original .sdf form. 1) remove all the p at the beginning of the lines 2) replace all the with \, all the with \, all the lt; with and the gt; with This should be enough. Now copy the whole file and paste it in the target contents part of the still opened .csv file. The .csv file now contains the source part and the target part next to each other. Let's save this (be careful: tab as field separator and nothing as text delimiter). Open the result in the text editor. The pattern we need to find to revert the 1 line blocks to 2 line blocks is something like: (something)(followed by lots of en-US stuff)a tab(the same something) (followed by lots of translated stuff) ^([^\t])(.*)\t\1(.*)$ and we need to replace it with: \1\2\r\1\4 Make sure there are no mistakes (if there are any they are likely to appear right in the first lines). Now you should have your 2 lines block. Rename the file to .sdf and here you are. This
Re: [l10n-dev] Error with PO files form Pootle
Alessandro, I have found a relatively painless way to directly translate the .sdf files in OmegaT. I have to finish my part now so I'll document that later. JC On 4 juil. 07, at 00:20, Alessandro Cattelan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, it seems that the PO files in the Pootle server used for the OOo 2.3 L10N are not properly formed. The comments of some strings have been included in the msgid field. Can somebody explain how to handle these strings? Here's an example taken from the UUI folder of the OOo GUI: #: masterpasscrtdlg.src#DLG_UUI_MASTERPASSWORD_CRT.modaldialog.text msgid _: masterpasscrtdlg.src#DLG_UUI_MASTERPASSWORD_CRT.modaldialog.text \n Enter Master Password msgstr #: masterpassworddlg.src#DLG_UUI_MASTERPASSWORD.FT_MASTERPASSWORD.fixedte xt.text msgid _: masterpassworddlg.src#DLG_UUI_MASTERPASSWORD.FT_MASTERPASSWORD.fixedte xt. text\n Master password msgstr Thanks, Ale. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Error with PO files form Pootle
On 4 juil. 07, at 17:39, Arthur Buijs - ArtIeTee wrote: Alessandro Cattelan schreef: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jean-Christophe Helary ha scritto: Alessandro, I have found a relatively painless way to directly translate the .sdf files in OmegaT. I have to finish my part now so I'll document that later. That sounds very interesting! I'll be waiting for it. Indeed. I'll test your documentation as soon as it becomes available ;-) Ok, I just give you an outline :) because I _am_ behind schedule... 2 main ideas: 1) the translatable contents is actually surrounded by tabs, 2) the escaped sequences are for HTML like code from 1): opening the file in OOo after renaming it to .csv produces something very nice to the eye from 2): removing the relevant \ produces strings that actually look _like_ HTML (all the \ are replaced by , while all the not preceded by \ are replaced by lt;) and we need a 0): the .sdf is composed of groups of 2 lines, putting such a group on one line to have the .csv file look like 2 column sets (one for source one for target) is trivial. now, you copy paste the column that contains the source contents to a text file, you add p at each beginning of line, you rename the thing to .html and you load it into OmegaT. The TMX created with po2tmx must be treated so that the code inside the segments looks like the tags that will be produced from the source file (namely e0, a0 etc...) so just replace all the \emph \ etc by lt;egt;, that way you'll only have to add the numbers when the match is inserted. OmegaT is smart enough to handle source segments that look like ahelp something very longblabla/ahelp and will only display blabla so that you are sure the source tags are protected during the translation etc... That is very rough and I have not yet back converted the file (put the and \ back where they belong), but when that is done, just paste the translated contents into your OOo Calc target contents column, save, put the groups back to 2 lines and deliver. It _does_ look like an awful hack (I'd say it is borderline, on the easy side of the line :) but it is way better than having to handle the po in source with the half backed TMX that you get from the po2tmx conversion. At least, OmegaT protects pretty much all the tags and you don't need to add them to the target segment, OmegaT does that nicely for you. Ok, back to my last 50 lines !!! JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT and PO files
On 19 juin 07, at 15:50, Alessandro Cattelan wrote: Now I have one more question to which I'm sure Jean-Cristophe has the answer... ;o) When opening a project with OmegaT I thought that the text in msgstr in the PO file would show up in the target segment, but that is not the case. I think that checking the PO file si quite useful because of the comments and of the changed strings extracted from the Sun database, but keeping OmegaT and a text editor with the PO file open side by side on a 15' monitor is not very comfortable... Is there any way to make OmT show at least the msgstr content? Yes. It is what I have been demonstrating here by creating a TMX from the .sdf file. Basically OmegaT's PO handling is only (just like for monolingual files) to put in the target editing field the contents of source, for edition. OmegaT is not able to know that a target already exist and to propose it for editing. What I have thus done is the following: 1) convert the .sdf file to .pot (oo2po -P etc) to remove all the msgstr contents 2) create a TMX from the pseudo translated .sdf (oo2po and then po2tmx, cf my comments on that process in a different thread) 3) put the .pot in /source/, the tmx in /tm/ and OmegaT will automatically match the .pot strings to their pseudo translated counterparts in the tmx, thus allowing you to have the msgstr contents in target. It is a little non-trivial, but remember that OmegaT is not made to work with bilingual localization files. It works with a monolingual file in source and bilingual TM files for reference. I think Rafaella should be able to provide you with proper TMX files that match the .sdf contents. Cheers, JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT and PO files
On 19 juin 07, at 18:17, Rafaella Braconi wrote: Hi Alessandro, Jean-Christophe, Alessandro Cattelan wrote: Jean-Christophe Helary ha scritto: I think Rafaella should be able to provide you with proper TMX files that match the .sdf contents. Rafaella has already provided us with the OLH TMX some time ago. yes, I was also suggesting jean-Christophe to use that tmx files but I think that he really want to make sure to get the most updated tmx files... I am currently loading them and I don't see anything problematic. The only problem I see with the method you propose is that we would end up having two TM. The TM I have is pretty big (over 12MB) and OmegaT takes a long time to analyse it. If I put another big TM in the tm folder I think it would end up being too slow. However, I'll have a look at that. I am not clear about this. Why would you end up having 2 TMs? Cannot one sinply use the most recent tMX files? Anyway, as far as OmegaT is concerned that does not matter. It is only the total number of segments to match that will lengthen he load process. I have just loaded the 8000 segments + 2 x 2000 segments TMXs matching to the whole .sdf-pot file and it took less than 2mn on my machine (MacBook duo core 2ghz/2gb) I also assigned 2gb to OmegaT -in Java seerver mode (faster in general): java -server -Xmx2048M -jar OmegaT.jar JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] escaping
OmegaT handles PO files pretty much as text files and thus does not care about \, for it, the \ is just another character. Hence, there is nothing that is generated by OmegaT in the screenshot I showed. The files are displayed as they are. Friedel, I am not arguing for or against a certain way to display the data I am just saying that OmegaT does not do anything to the data. And considers the PO escapes as a \ character. Unfortunately a PO file isn't just a text file. It is a file format that presents data in a specific way. To escape the slash (\) and the quotes () is part of the format that we try to conform to. Which is very good and OmegaT does not interfere with that. big_snip So, you see, the TMX does not exactly match the original .po file. Although it does match the .sdf, but this is irrelevant. When I created the TMX by using XLFEdit from Heartsome, I first too the converted po, converted it to XLIFF and then exported it as TMX and the TMX contained the same number of escapes as the po. I would consider this behaviour by the Heartsome tool to be a bug, to be honest. Do they convert '' to 'lt;' ? Then they should also convert the rest. I would say this is part of the rules of data conversion between these formats. I believe our conversion conforms to the XLIFF representation guide for PO files: http://xliff-tools.freedesktop.org/snapshots/po-repr-guide/wd-xliff- profile-po.html#s.general_considerations.escapechars I think it follows logically that the same rules should apply for converting to TMX. I have no idea who is right and who is wrong. What I can say is that Heartsome is _very_ strong when it comes to respecting standards. Besides, the document you quote has contributions from Rodolfo Raya who is also developer at Heartsome and who himself is extremely picky when it comes to standards compliance. In 3.4.Handling of Escape Sequences in Software Messages, the text says, regarding a fragment that includes escape sequences like we have here: This fragment could be presented in XLIFF by preserving the escape sequences: etc. Of course it proposes rules to handle special escape sequences as opposed to generic escape sequences but there is nothing wrong seemingly with keeping all the escape sequences. What matters in the end is _not_ that the PO has been through an XLIFF conversion process or not. What matter is that: 1) I have a source po with \\\this kind of things\\\ 2) my reference TMX should match that with \\\that kind of things\\ \ because it is created from a similar po file 3) but for some reason it provides only \\this other kind of things\\ Let me repeat myself. I have no issue with your processes and with your level of compliance with the proposed standards. The only problem is that somewhere, the TMX conversion process looses data and that impairs my ability to get leverage from it. A somewhat separate issue for me is that the \ in the SDF file is also an escape of that format. In reality it refers to just a left angular bracket. The SDF format is however a bit strange in the way these are used, and we might not want to change the way we handle the SDF escaping while Pavel's POT files has a semi-official status. If we can agree how we interpret the escaping in the SDF file and coordinate the change, we can probably make the lives of translators far easier by eliminating much of the escaping. I don't think the problem is in the oo2po process. Whatever the result we are all starting from po anyway. What is at stake here is that if I take a po created from .sdf and I use po2tmx on that same file, the data that the TMX contains is different from the data in the po. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT and PO files
On 19 juin 07, at 17:12, Alessandro Cattelan wrote: I don't understand why you need to create .pot instead of .po files. I converted the sdf to po files and OmegaT just ignores the msgstr content, so what is the use of having a pot file with empty msgstr fields? Because I am pretty sure OmegaT would not overwrite the msgstr part since it does not know about it. So this is likely to result in a buggy target file. But maybe I am just too careful. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT and PO files
On 19 juin 07, at 19:51, Rafaella Braconi wrote: yes, I was also suggesting jean-Christophe to use that tmx files but I think that he really want to make sure to get the most updated tmx files... I am currently loading them and I don't see anything problematic. Are you referring to the tmx TEST file I just provided? Does this really work? Yes, I am replying to you offlist with some specifics but basically they work well. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Contents of the OOo 2.3 .sdf, problems with TMX conversion
Rafaella, Thank you very much for the comments. I was a little confused because it seems that for each different community project SUN manages, there is a different way to localize :) I would like to know if it is possible to provide us (or at least me...) with the .sdf strings _before_ the current modification so as to be able to create a correct TMX file. If you could create that TMX yourself and make it available it would be even better. That TMX would contain the state of the corpus before the modifications (2.2.1) and would allow translators who work with TMX supporting tools (including Sun's own OLT, or OmegaT to name only the free ones) to work efficiently with the current files. For you information, I decided to create a .pot file out of the .sdf so that I was sure there were no pseudo-translated strings in French and I created a pseudo-tmx with the original contents that I use to match every source string with. This solution is better than hand editing the whole file but the whole thing would be even more efficient if instead of a pseudo-tmx I had the real thing based on the 2.2.1 contents. Do you think it is possible to get that from SUN ? Regards, Jean-Christophe On 18 juin 07, at 17:03, Rafaella Braconi wrote: Hi Jean- Christophe, in the QA session you may find the answer to your question already: http://wiki.services.openoffice.org/wiki/Translation_for_2.3#Q_.26_A Also, please see my comments inline: Jean-Christophe Helary wrote: I realized a few days ago that the .sdf (at least for the fr project) for the coming 2.3 contains weird stuff without much of an explaination as to how to differenciate the different parts. 1) in some places the target part is made of what would be a fuzzy in PO, but without specific notification of the fuzzy character what you see is the previous translation. This means that in the meanwhile the English text has been updated and since in most cases the old translation contains terminology which can be reused for updating the string, we decided to keep as a sort of *suggestion* the previous translation instead of overwriting it with the English text. 2) in some places it seemingly contains exact matches sometimes the English text has been updated in such a way that this is not translation relevant. For example a typo in the English text has been corrected. Since the authors may not necessarily know if a change is translation relevant or not, they flag the English updated text has updated and it gets extracted as *changed* strings when we prepare the files to send to translation. 3) in some other places it contains the source English string when the English text is completely new. This means that this is the first time the strings gets translated. In the case where the fuzzy is present, the reference links are sometimes totally different. Which means that besides for the actual editing of the translation, it is also necessary to edit the links. Yes, in this case the translation needs to be updated including links, tags, variables etc I wonder about the utility of such a mechanism especially since there is no way to differenciate between the 3 patterns in the .sdf itself. The utility is that in may cases the previous translation contain terminology that can be reused to update the text It seems to me it would have been faster to _not_ insert fuzzies at all and to provide a complete TMX of the existing OOo contents instead. They are not fuzzies. Right now, if one wants to create a TMX out of the .sdf files (either with the Translate toolkit or with Hearstome translation suite, I suppose there are other ways though), it is impossible to have the source strings corresponding to the fuzzy target and thus the matching in a TMX suppotirting CAT tool will not be of much use. You cannot create TMX out of the sdf files provided because the translated strings contained in it are not final translations Is there still a way to get SUN to provide the l10n teams with TMX of the existing contents, similarly to what we can get through the SunGloss system ? We could provide you with an sdf files containing the final translations if that helps Rafaella (FYI, the NetBeans team is provided with TMX and that greatly enhances the localization process.) Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT or poEdit?
On 18 juin 07, at 22:22, Alessandro Cattelan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, starting from today I'll have some more free time to dedicate to OOo L10N so I'd like to start working on it. I'm wondering whether the Italian team should use OmegaT or poEdit to translate the OLH and possibly the GUI (using Pootle as a translation workflow manager). Petr, Rafaella, can I go ahead and use OmegaT? Ale, I noticed that the TMX I created with translate-toolkit from the pseudo-translated .sdf are not useable because for some reason the po2tmx script systematically removed one escape \ character from the original po file. I had to use a non-free tool to create the TMX, but if Rafaella and SUN can provide the teams with a TMX of the 2.2.1 strings then I personally think that OmegaT (because of the automatic matching) is the tool of choice for the people who are used to it. Besides for the fact that you can leverage your old TMX with it too. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT or poEdit?
Hi, starting from today I'll have some more free time to dedicate to OOo L10N so I'd like to start working on it. I'm wondering whether the Italian team should use OmegaT or poEdit to translate the OLH and possibly the GUI (using Pootle as a translation workflow manager). Petr, Rafaella, can I go ahead and use OmegaT? Ale, I noticed that the TMX I created with translate-toolkit from the pseudo-translated .sdf are not useable because for some reason the po2tmx script systematically removed one escape \ character from the original po file. Hi Jean-Christophe Please elaborate on the problem so that we can find out where the error comes in and fix it if necessary. You can reply here, in private mail or the translate-toolkit mailing list - as you prefer. Friedel, Thank you very much. To put it simply, I did: oo2po and then po2tm from the .sdf file that compose the current job. At first I did not notice anything but after a few segments, I found what I was lucky to capture in the screenshot I linked to the other day: http://www.eskimo.com/~helary/_files/session_omegat.png the green background segment is the oo2po file pretty much without modifications (notice the fact that all the \ are doubled, the \\ even come as \\\) the lower part shows you the po2tmx segment matching the current source: the contents should be identical but you'll see that there are systematically \ missing. I re-created the tmx with Heartsome's XLFEdit and got a file matching the source segment properly. The original .sdf does not contain the extra \ though, so I suppose they are put there in the oo2po process, which is fine, as long as they stay there all the way :) JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT or poEdit?
On 18 juin 07, at 22:51, Rafaella Braconi wrote: Hi Friedel, it would be really great to hat that issue fixed. In that case we would be able to provide sdf files containing final translations (and not pseudo ones) which can be used to create tmx files. It is also possible to use Rainbow (from the Okapi framework - LGPL .NET 2.0) to get the proper TMX from this process. Just use oo2po to get a po file and Rainbow to convert that to TMX. JC Please let us know about the outcome. Rafaella F Wolff wrote: On Ma, 2007-06-18 at 22:40 +0900, Jean-Christophe Helary wrote: On 18 juin 07, at 22:22, Alessandro Cattelan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, starting from today I'll have some more free time to dedicate to OOo L10N so I'd like to start working on it. I'm wondering whether the Italian team should use OmegaT or poEdit to translate the OLH and possibly the GUI (using Pootle as a translation workflow manager). Petr, Rafaella, can I go ahead and use OmegaT? Ale, I noticed that the TMX I created with translate-toolkit from the pseudo-translated .sdf are not useable because for some reason the po2tmx script systematically removed one escape \ character from the original po file. Hi Jean-Christophe Please elaborate on the problem so that we can find out where the error comes in and fix it if necessary. You can reply here, in private mail or the translate-toolkit mailing list - as you prefer. Friedel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT or poEdit?
On 19 juin 07, at 08:26, Arthur Buijs wrote: Hi, Rafaella Braconi schreef: For all the ones who are still looking for the answer to the question ... yes OmegaT can be definitively be used to translate the sdf files converted into po files. Really thank you to Alessandro, Jean-Christophe, Petr and all the ones who have worked to check into this and for sharing the information with the others. +1 Sharing this information was very helpfull. http://wiki.services.openoffice.org/wiki/ Nl.openoffice.org#Translation_for_OOo_2.3 Be aware that I was using Windows ;-) Comments greatly appreciated. I have just modified your text a little bit to clarify the project setting up and the glossary export from SunGloss. JC ps: there is a user list on Yahoo where the support is quite good, but I can create a list on SourceForge for people who prefer to stay on free open ground. -- Regards/Groeten, Arthur Buijs Open software is a joy forever! http://nl.openoffice.org #nl.openoffice.org (irc.freenode.net) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Contents of the OOo 2.3 .sdf, problems with TMX conversion
I realized a few days ago that the .sdf (at least for the fr project) for the coming 2.3 contains weird stuff without much of an explaination as to how to differenciate the different parts. 1) in some places the target part is made of what would be a fuzzy in PO, but without specific notification of the fuzzy character 2) in some places it seemingly contains exact matches 3) in some other places it contains the source English string In the case where the fuzzy is present, the reference links are sometimes totally different. Which means that besides for the actual editing of the translation, it is also necessary to edit the links. I wonder about the utility of such a mechanism especially since there is no way to differenciate between the 3 patterns in the .sdf itself. It seems to me it would have been faster to _not_ insert fuzzies at all and to provide a complete TMX of the existing OOo contents instead. Right now, if one wants to create a TMX out of the .sdf files (either with the Translate toolkit or with Hearstome translation suite, I suppose there are other ways though), it is impossible to have the source strings corresponding to the fuzzy target and thus the matching in a TMX suppotirting CAT tool will not be of much use. Is there still a way to get SUN to provide the l10n teams with TMX of the existing contents, similarly to what we can get through the SunGloss system ? (FYI, the NetBeans team is provided with TMX and that greatly enhances the localization process.) Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TM, glossaries and OmegaT
On 16 juin 07, at 04:48, Alessandro Cattelan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I ran a couple of test to see whether OmegaT could be used to translate the OLH for OOo 2.3 and I forgot to post the results here as a follow-up to this discussion (I've sent them to Sun and to the Italian community only). I'm doing in it now just in case someone is interested in all this. Apart from the test with OmegaT, I ran the same test with poEdit using the same files and procedures. At the end of this e-mail you'll find a report of what I've done, from converting the original SDF file to PO, translating PO files with OmegaT and poEdit and then back-converting the files to SDF. I'm not attaching all the files and directories used since it would be too heavy - you can download them from the following address: http://tinyurl.com/2s4zwu Ale. Ale, I started my own OmegaT process yesterday and I roughly documented it on the fr-l10n list. Basically what I did was the following: 1) get the .sdf and convert to one big .pot to make sure the automatically inserted translations were not present. oo2po -P --language=fr --nonrecursiveinput HC2_93824_89_2007-06-05_33.sdf HC.pot 2) convert the .sdf to .po, convert that to .tmx, clean the TMX to remove parts where the original msgid and msgstr were identical (not necessary) po2tmx --language=fr HC2_93824_89_2007-06-05_33.po HC.tmx 3) export the EN-FR glossary from SunGloss and keep source term/ target term/target comment, all separated by tabs. I loaded all this in a dedicated OmegaT project (.pot in / source/, .tmx in /tm/, glossary renamed with .utf8 in /glossary/) And what I get is the familiar OmegaT session illustrated here: http://www.eskimo.com/~helary/_files/session_omegat.png For those unfamiliar with it, the top left part is the editor window (the bold green bkg part is the source segment, to be translated right below between segment and end segment The bottom left part is the translation memory matching window. Since I use the original .sdf contents I always have at least a 100% match that I either us as is or edit (after rewriting in the edit field with Ctrl+R) I can select other matches (Ctrl+nb) for rewrite (Ctrl +R) or insertion at point (Ctrl+I) The right part is the glossary part. Items cannot be inserted automatically, they are only for reference. There are menus that are not displayed on the screenshot, from where one can create the target files (at any time during the translation) to check them, it is possible to modify the project segmentation at any time (regex based) etc. The only worry I have is that the target file will have problems with the back convertion to .sdf but your testing seems to prove that those can be relatively easily fixed... JC - - ## Converting PO into SDF ## - - [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion HC2_93824_89_2007-06-05_39.sdf OLH-OmT-Project [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ ls backconversion/ HC2_93824_89_2007-06-05_39.sdf po [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT$ cd backconversion/ [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ ls HC2_93824_89_2007-06-05_39.sdf po [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ po2oo -t HC2_93824_89_2007-06-05_39.sdf -l it po it_IT.sdf /usr/lib/python2.5/site-packages/translate/storage/po.py:31: DeprecationWarning: The sre module is deprecated, please import re. import sre processing 35 files... Error at po/helpcontent2/source/text/shared/01.po:: 0211.xhp#par_id2366100.help.text: escapes in original ('\n', '\emph\Replace', 'with\/emph\') don't match escapes in translation ('\emph\Replace', 'with\/emph\') Error at po/helpcontent2/source/text/shared/01.po:: 0211.xhp#par_id9262672.help.text: escapes in original ('\n', '\emph\Search', 'for\/emph\') don't match escapes in translation ('\emph\Search', 'for\/emph\') [###] 100% [EMAIL PROTECTED]:~/Desktop/Test-L10N-OOo2.3/OmT/backconversion$ /opt/gsicheck_1.8.2/gsicheck it_IT.sdf NO OUTPUT - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] TM, glossaries and OmegaT
Ale, If non-pootle users still want to use TMs it is possible to use OmegaT too. I should have thought about that a couple of weeks ago, before going around looking for info on PO editors and all the rest. I think I've missed the point in which OmegaT started supporting PO files... :o( Sincerely, I really wondered why you werre not considering it when you started asking your questions about PO files here and there :) We are now working on the OLH translation with poEdit and most of the translators are complaining about the change: for a translator OmegaT is just way better than any PO editor. Especially if you work intensively with TMs. I've tried doing what Jean-Cristophe suggested above and seen that it could work. The only issue is that some TM matches will make no sense because OmegaT take the tags into consideration while computing the match percentage. For instance, for the following segment: Ok, the problem is that PO files are not supposed to contain XML strings :) Hence the suggestion that a little tweaking of the existing filters would provide better matches with the original .sdf files. But I've worked with OOo sdf-po converted files in the past and had no problem getting over this issue. I have not yet checked the current files' contents but if it is more about text than links then you'd rather use OmegaT with your TMX files. Given all this, I would say that OmegaT could be the solution here. At least for the Italian community which is used to this tool and appreciate its features. I'm going to give it a try. One of the things we'll have to pay attention to is whether the translated file are correct and can be imported painlessly into Sun database as an .sdf file. I'll send Sun a few translated files to test this and will report back as soon as the check is done. This is my worry also. So you should give it a try first with a short file. One thing is not clear, though: why should I need to run msgcat? Can't I just work with a bunch of separated po files and directories in a tree structure (basically what I get when I run oo2po on the .sdf file)? msgcat was suggested by the original PO file developer. See if that works without it. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] TM, glossaries and OmegaT
On 11 juin 07, at 22:07, Rafaella Braconi wrote: Hi Alexandro, for Russian Italian and Khmer we are referring to the Pootle server hosted on the Sun virtuallab. Please see details at: http://wiki.services.openoffice.org/wiki/New_Translation_Process_% 28Pootle_server%29 The 3 above languages are the only one which will be using Pootle to translate the 2.3 version since we are in the initial/pilot phase. If everything goes well or at least we sort our the issues and we are able to fix them, all other languages and team that want to be added to this tool are more than welcome to join. That would be for the 2.4 release. If non-pootle users still want to use TMs it is possible to use OmegaT too. Sun can probably provide us with TMX files of previous translations in the relevant language pairs, the SunGloss contents can be exported as a glossary file and the source PO can be pretty much translated as is. The correct procedure would be: 0) create a project in OmegaT 1) correctly format the PO file with msgcat 2) put that file in /source/ 3) make sure that your TMX has srclang set to your source language the way it was defined in the project settings 4) put the TMX file in /tm/ 5) make sure the exported SunGloss is a tab separated list in at most 3 columns (1st col= source language, 2nd col=target lang, 3rd col= comments) 6) put the glossary in /glossary/ 7) open the project and translate 8) when the translation is completed, msgcat the file in /target/ and deliver For info, OmegaT is a GPLed Java Computer Aided Translation tool developed for _translators_. It is specifically _not_ for geeks. Which means that it is relatively straight forward to use. I am pretty sure the filters can be tweaked to directly support the .sdf format but I leave that to others. I know some already use it here. NetBeans localizers also use it intensively. And real translators too :) Jean-Christophe Helary (OOo-fr) http://sourceforge.net/projects/omegat/ CVS veersion: cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/ omegat co -P omegat - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Pootle and terminology
On 11 juin 07, at 09:11, Alessandro Cattelan wrote: I'd been told before that it should be quite easy to convert a txt into PO but unfortunately I don't know how to do it. Basically what I have is a long list of terms and expressions in two tab-separated columns, one for the English version and one for the Italian translation. Something like this: fractionfrazione I understand that a PO files with these entries would look something like this: msgid fraction msgstr frazione Is that correct? I assume it would be quite easy to write a script for that, but I can't do it. Ale, No need for a script. Take the text editor you usually use and open your text file. 1) I assume that you understand regular expressions a little bit 2) and that the character between fraction and frazione in your text file is a tabulation You'd have to search for: ^([^\t+])\t([^\t+])$ and to replace by msgid \1\rmsgstr \2\r\r The regexp may be slightly incorrect and will certainly depend on the text editor you use but give the above thing a try and fine tune until you get the proper results. Cheers, JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Po editor for OOo 2.3 translation
On 1 juin 07, at 03:14, Alessandro Cattelan wrote: We've been asked to use the PO format because in a past project converting to XLZ and back-converting to PO created quite a few problems. I would be much happier using some other tool such as Heartsome XLIFF Editor but I'd like to avoid producing a good quality yet useless translation. Ale, Sophie (French lead) told me that indeed, the PO files provided for the most recent l10n job were not of the best quality. This time, the French l10n team will get .sdf files that Sophie will convert to .PO using the translate-toolkit tools which will, supposedly, provide translators with workable files. If you manage to get the .sdf files I am sure there are ways to deal with them effortlessly with your editor of choice. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Po editor for OOo 2.3 translation
On 31 mai 07, at 03:39, Alessandro Cattelan wrote: Hi, I'll be working on the Italian translation for OOo 2.3. For the GUI translation we'll be using Pootle whereas for the translation of the online help we'll be using a PO editor. Ale, Is this the file you mentioned on l4t ? If yes, I was not aware that the .sdf files converter produced broken PO. Maybe reporting that as a bug would be better than trying to find a PO editor that works with broken files :) And I forgot to mention that emacs has a PO mode, but I've not used it in a long time so I don't know if it's worth it. Regarding using the TMX: convert it to PO with a few regex and use the gettext tools to incorporate it to your current PO file. That way you won't need a TMX fuzzy matcher. But I really think using PO dedicated tools for translation is a waste of resource. There are plenty of CAT tools that will leverage your TMX and parse your PO. But you need to get the PO fixed first, if possible. JC I don't have much experience with PO editors as I've only tried for a short time software such as poEdit, Kbabel and Gtranslator. I'd like to know if you have any suggestion as to which PO editor to choose. It would be very important for me to be able to import or reuse a TMX file I have. Is there any tool that would let me do that with a PO editor? Thanks. -- Alessandro - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] New version of TMX: release for public comments.
I just found: TMX 2.0 released for public comment - March 28, 2007 TMX 2.0 has been released as a committee draft specification for public comment. Comments will be accepted through June 1, 2007 and should be sent to [EMAIL PROTECTED] The specification can be viewed online or downloaded as a zip archive. OSCAR is particularly interested in comments relating to implementation issues and especially welcomes feedback from tools developers and users of TMX. on the Lisa TMX page: http://www.lisa.org/standards/tmx/ Although the draft has been available for 3 weeks now, I don't remember seeing any announcement on any list of users of TMX. The release is made for public comments but I also did not find anywhere the comments made in the last 3 weeks were made public. Did I miss a link ? Jean-Christophe Helary OmegaT - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Which part of Help does what?
On 21 janv. 07, at 18:21, Pavel Janík wrote: Is it really necessary to subscribe to and read so many mailing lists, in order to do a reasonable job as a translator? You don't need to subscribe to the mailing list to be able to read it so your question doesn't make sense. It is not her question that does not make sense, but the original answer: subscribe there to get the info to your one and only question. I feel that you have never worked in so large project like OOo is. I don't want to talk for Clytie because she is big enough to do that herself, but your feeling is wrong. Or maybe you mean messy by large, in which case, you may be right. As I pointed out earlier today: plenty of redundant information all over the place, but to get _the_ tiny bit the one misses the suggestion is to subscribe to _yet_ another mailing list... Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Ain's howto in the wiki
On 19 janv. 07, at 15:45, Jean-Christophe Helary wrote: why is the translate toolkit necessary ? Sorry to have brought the thread to a place far from where I expected it to go... Anyway, I installed the translate toolkit and read the documentation and I eventually figured out that we need it basically for is: 1) convert the POT files to PO files 2) merge previous compendium to the current file There is a whole paragraph about the specificity of the OOo localization file format all the rest of the document is about translation itself. But according to Damien in the other day's IRC: [21:58] damiend We (Sun) extract POTs AND PO from the sources daily So we don't really need to convert POT to PO anymore. Do we ? So the purpose or the translation toolkit is basically only to merge the compendium files, Am I misunderstanding something ? I understand that the current document is based on 2 years old data, so it may be that my questions are not relevant at all any more. But I'd like to contribute to this how-to, at least to separate it in two as I suggested the other day, so I welcome any comments. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: Ain's howto in the wiki
These pages have unique information, doubled information, uptodate information, old information and irrelevant information ;) Someone writing good English should merge this information and outline as you described. In Wiki of course. Ok :) JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Ain's howto in the wiki
Ain, Thank you for the precision. As I wrote to Clytie I did not mean to sound harsh. As I am far from understanding the actual job of a team leader (Sophie is my leader) I'll wait a bit before making proposals here. As I wrote in my first post, the French team received .xlz files created from the original .po. I did find that funny since OLT comes with a filter that is very simple to use and that allows translators to convert to .xlz if they need it. Besides, there was an encoding problem in our files so I had to de-convert the .xlz back to .po and to translate it in OmegaT etc. From there I had a discussion with Sophie about the source file format issue and she advised that I joined the discussion here. JC On 19 janv. 07, at 16:46, Ain Vagula wrote: On 1/19/07, Jean-Christophe Helary [EMAIL PROTECTED] wrote: Clytie, Ain (?) I have 2 questions: why is the translate toolkit necessary ? And why aren't there any references to OLT ? The French community receives .xlz files and would be unable to follow the howto since it is exclusively based on po files (which is not a bad thing in the absolute, but that is a different issue). It seems to me the document addresses team leader's needs and not really translators who will certainly not have to deal with most of what is described on the document. You are right, it is about starting new translation for perspective team lead. It is written 2 years ago and is in state of unfinished draft. As it is in wiki, everyone can fix or complete this. Setting inner rules for particular language is language teams own business - formats, tools, way of communication, etc. Of course there are 3-4 common directions that should be described, someone has to do this but not me. ain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Ain's howto in the wiki
On 19 janv. 07, at 17:25, Clytie Siddall wrote: I was on the IRC last night (brandelune) I wish there were tooltips or something so you could tell who people are. Well, on mine (xchat/osx) you just click on the person and the registered name appears. ps: OmegaT has a PO filter given to us last year by a Debian-fr activist... :) was wondering about using it with those dratted Help files. Have you seen them? XML and lots of repetition. I actually did the translation of a part of the recent package with OmegaT from the abck-converted po files. I set 2 rules to isolate the \\ or whatever was encumbering the segments and I was done. I see that the original files here are sdf which looks like very easy to handle directly in OmegaT. The already translated parts could easily be converted to TMX and the not translated parts could be parsed either with a dedicated filter (any Java geek here ?) or with regex based segmenting rules. I saw that most of the discussion yesterday revolved around what kind of CV system to use, I have never used Pootle so I guess I'll have to check how it works to be able to participate more here. It is interesting to see how a lot of FOSS project have converging ideas. But it seems to me all this is still a little to geeky for the common translator to be able to join. Even OLT's learning curve is quite steep (plus .xlz is not a standard format...) Anyway, I have to get back to my feed the kids deadlines ! I'll be back ! :) JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Re: Ain's howto in the wiki
Ain, So the question you seem to have with xliff: On 19 janv. 07, at 19:48, Ain Vagula wrote: By the way, I noticed that xliff files are stored in packed format, xlz? It means that when using xliff as only intermdiate version, without po- step, you'll not have any overview over changes in version control system? Its only a question, not statement - I dont know almost anything about deeper mechanism of version control. Is basically: can we have the same level of vc information with xliff files than with po files ? (besides for the fact that OLT's xlz is not representative of standard xliff files). I don't know what is technically necessary in the current back-end, but basically po and xliff are formats with the exact same purposes. I'd say the only difference is historical: po came with GNU and gettext while xliff came from the localization industry and XML and is also much younger but integrates much better in xml workflows. Sorry for misunderstanding what you meant by version control here. My idea is whatever localization format you use it will be easy to integrate it in the back-end, so we should not consider ourselves limited to only-po or only-whateverelse. But I fear that in the long run, sticking to po will not contribute to improve the whole system. To me it would make much better sense to use a vendor-neutral version control system where the controlled output file is the closest possible to the original file (be it sfd or anything else the process uses internally) and provide a number of end-user filters to ease the life of would be translators: some would translate the raw files, some would convert that to po for use in their po tools, some would convert that to xliff etc... So that we'd have a team leader who handles all the commits etc and packages the data for the team according to the procedure the team has chosen. Whatever form this vc system takes it should be able to also output combined updated packages on which to build reference glossaries (CSV or TBX for ex) and translation memories (CSV or TMX for ex) by automatically aligning the committed files. The few FOSS I participated to have all very good vc systems but very poor translation memory/glossary management, which means that the translator usually has to find reference the hard way. The computer aided translation tools that exist on the market today (FOSS or not) don't seem to be fully used in most FOSS projects which means that a lot of QA has to be done, and re-done and done yet again because translators can't fully leverage older translations. Jean-Christophe VC is fine for some processes but I think it is a little too much for our purposes. As long as you have past documents stored as translation memories (TMX) you don't need to have VC at all anymore (at least if you mean VC the way I mean it). If your text has already been translated it will be there in the TMX and either you have a system to automatically update it or you do that manually. There are a number of issues related to TMX: do we store everything as sentences, or as paragraphs etc. And if we leave translators free to use the process they want, how do we guaranty that they deliver a TMX with the final document. That is where XLIFF comes in: as long as the delivered document is XLIFF it is trivial to extract a TMX from it for recycling in the next translation. etc. Sorry, I think as I type and maybe that was not the kind of answer you were looking for. I mean version control as CVS, SVN etc. We keep currently po-files in CVS repository. When someone with write-access commits something to CVS, system automatically sends a notification with full diff to projects cvs mailing lists. (po-format is very easy readable) So is easy for all members to have overview whats happening, also it is easy to write comments or questions about commits, reply to comments and make proofreading immediately after commit. This is the way how all free software projects where I participate are functioning. ain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] running OLT
You should send that to OLT's list. https://open-language-tools.dev.java.net/servlets/ ProjectMailingListList;jsessionid=D680E07AC70B0C4E86FCF49D0E2D96EB JC Helary On 19 janv. 07, at 14:35, Ain Vagula wrote: openSUSE 10.2, trying to start OLT: - with java 1.4.2: [EMAIL PROTECTED]:~/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7 ./ translation.sh Using java: /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java: error while loading shared libraries: /home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7/ spellchecker/lib/libgcc_s.so.1: ELF file data encoding not little-endian Installation direcotry: /home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7 Classpath: TransEditor.jar:i18n:classes/dom4j-161.jar:classes/ fuzzytm.jar:classes/swing-layout-1.0.1.jar:classes/ xerces2.jar:classes/XliffBackConverter.jar:classes/xmlParserAPIs.jar /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java: error while loading shared libraries: /home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7/ spellchecker/lib/libgcc_s.so.1: ELF file data encoding not little-endian -with java 1.5.0: [EMAIL PROTECTED]:~/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7 ./ translation.sh Using java: /usr/lib/jvm/java-1.5.0-sun-1.5.0_update8/jre/bin/java java version 1.5.0_08 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) Client VM (build 1.5.0_08-b03, mixed mode, sharing) Installation direcotry: /home/ain/Open_Language_Tools/XLIFF_Translation_Editor/1.2.7 Classpath: TransEditor.jar:i18n:classes/dom4j-161.jar:classes/ fuzzytm.jar:classes/swing-layout-1.0.1.jar:classes/ xerces2.jar:classes/XliffBackConverter.jar:classes/xmlParserAPIs.jar 19.01.2007 7:36:35 org.jvnet.olt.editor.translation.TransEditor run SEVERE: Exception:java.lang.Error: can't load com.birosoft.liquid.LiquidLookAndFeel ain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Ain's howto in the wiki
Clytie, Ain (?) I have 2 questions: why is the translate toolkit necessary ? And why aren't there any references to OLT ? The French community receives .xlz files and would be unable to follow the howto since it is exclusively based on po files (which is not a bad thing in the absolute, but that is a different issue). It seems to me the document addresses team leader's needs and not really translators who will certainly not have to deal with most of what is described on the document. Also, I fear the described process is very likely to not attract people who are translators and who could contribute quality work because it is what they do for a living. It is way to geeky. I understand the fact that a lot of people involved in FOSS l10n are familiar with most of the concepts and tools presented there. But now that OOo has grown well beyond geek zone, it seems to me most of the document's contents are not (or should not be) relevant to what a potential translation contributor should really be familiar with. I suggest experienced team leaders edit the file to make a clear distinction between the leader's job and the translator's job. Jean-Christophe On 19 janv. 07, at 15:08, Clytie Siddall wrote: Hi everyone :) After last night's L10N IRC meeting, where we did mention the need for l10n howtos, especially for new translators, I have published Ain's howto in the wiki: http://wiki.services.openoffice.org/wiki/ NLC:New_Translators_Start_here I've added bits and pieces, but it's basically all Ain's work. :) I've linked it from the main NLP page. We don't seem to have a main L10N page in the wiki. I couldn't get footnotes working, so there are notes in parentheses. :( I hope this doc. is useful. It embodies a whole list of things I had to find out by accident, or by persistent questioning, during this first release (2.1). I couldn't find the link to Pavel's blog instructions on submitting files to him for build. Does someone have it, please? Please feel free to amend this doc or add to it. :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] followup to issue 73501
On 18 janv. 07, at 03:40, Marcin Miłkowski wrote: Andras Timar napisał(a): This is what happened with this OOo 2.2 update in case of some Sun languages (e.g. i73150). Translators who use OLT should share their experiences in this list. The main question is how good the OLT is at ignoring tag changes. Does it offer good matches from the mini TM in case of tag changes? Yes, it does, but Transolution (a Python XLIFF translation memory tool) was even nicer as it allowed many ways to visualize tags on the screen (so that the view is not cluttered). You could try MemoQ (freeware and Hungarian, made by engineers from Morfologic, which is a great recommendation to me), and Across (from Nero). They are closed source but still free to use (MemoQ is even Linux-compatible, I guess). And there's OmegaT - tag handling is probably better now than before. It depends on what you mean by before. The main improvement is that OmegaT TMX files now respect tags and encapsulate them in the proper XML code. We have tested import of OmegaT's TMX into SDLX or Trados etc and the results were quite positive. Besides for that OmegaT does not use penalties for different tags in match and source, so in a way it is nicer on the user. Plus it is slightly more intuitive and faster than OLT (that I use also sometimes). But since OmegaT is not a XLIFF editor, it requires to work on the source file directly, and thus to have the source file format supported (PO is one of the supported formats). I like OLT as I was able to do some translation jobs that would require the notorious TagEditor from Trados, yet it is very slowly developing as the main developer from Sun, Tim Foster, is not working on that anymore. I could try to implement new features but... It's not high on my to-do list. Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]