Re: [l10n-dev] OmegaT UTF-8 problem
FYI, OmegaT is currently working on a Simplified Chinese version :) It should be ready in a few days ! JC On 13 juil. 07, at 10:42, ChengLin wrote: HI, We're trying to use OmegaT in Simplified Chinese Windows XP, it can't save to UTF-8 but Chinese GBK. Could anyone help us? Thanks! --- Cheng Lin 2007-07-13 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Translating .sdf files directly with OmegaT
On 13 juil. 07, at 02:40, Alessandro Cattelan wrote: Actually I haven't tried going through the procedure you described, I think I'll give it a try with the next batch of files. We'll have around 4,200 words to translate and as it is a reasonable volume, I think I'll have some time to spend in testing a new procedure. What I fear, though, is that OmegaT would become extremely slow processing a huge SDF file. If I have a bunch of PO files I can just import only a few of them into the OmT project at a time and that makes it possible to translate without too much "CPU sweat" :o). When I tried loading the whole OLH project on which we worked in June, my computer was almost collapsing: it took me over an hour just to load the project! I don't have a powerful machine (AMD Athlon XP, 1500Mhz, 700MB RAM) but I think that if you have a big TM it is not wise to load a project with over a thousand segments. You are definitely right here: the bigger the TMX the more memory it takes. Which is the reason why I just suggested (in the "Imagine" thread) that we have TMX by modules. Also, you can assign OmegaT more memory that you actually have on your machine, I use OmegaT like this: java -server -Xmx2048M -jar OmegaT.jar & The -server option makes it faster too. The sdf files we have are not that big though. So you have to be selective with the TMX you use. Maybe we could split the SDF file into smaller ones, but I'm not sure that would work. If you try my method, you can translate bits by bits. There are no problems with that. What matters is that the reverse conversion is properly made. JC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] OmegaT UTF-8 problem
On 13 juil. 07, at 10:42, ChengLin wrote: HI, We're trying to use OmegaT in Simplified Chinese Windows XP, it can't save to UTF-8 but Chinese GBK. Could anyone help us? You go to Options/File Filter, select the file format you are using and edit the output encoding. Jean-Christophe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] TMX/XLIFF output (Re: [l10n-dev] Imagine :))
On 13 juil. 07, at 04:45, Rafaella Braconi wrote: No, but that means that correct TMX files are a possibility (even now). By the way I wonder why Rafaella told me creating TMXs of the state of the strings before the current updates was impossible ? to clarify: the only possibility I have is to provide you TMX files in which translation exactly matches the English text now. If the English source has been changed I have following situation: New English text - Old translation (matching previous text). In the database I have no possibility to provide you with files containing Old English text and Updated English text. Don't you have a snapshot of the doc _before_ it is modified ? I mean, I have the 2.2.1 help files on my machine, so I can use the XML files in, for ex, sbasic.jar in the EN folder and align them with the same files in the FR folder and create a valid TMX of the state of the 2.2.1 version. This is what I suggest you keep somewhere, for each language pair (with EN in source). So you have a static set of TMX, archived by module (sbasic, swriter, etc) for each language, available from the community web, and translators just get the TMX they need for their current assignment. Such files don't need to be dynamically generate,d they are valid for the most recent stable release, once the release is updated the files can be output for the translation of the next version. So, create the TMX _before_ you modify the data base, _or_ from static files that exist anyway inside any copy of OOo. And create TMX level2 files, with all the original XML encapsulated so as not to confuse CAT tools and translators. Regarding the output of "proper" source files, now that we (I...) know that the original is in XML, it should be trivial to provide them either directly as XML sets (specifically _without_ outputting diffs), or as XML diffs, or as XLIFFs. You may have some technical requirements that have you produce SDF files, but those only add an extra layer of complexity to the translation process and I am sure you could have a clean XML output that includes all the SDF contained meta info, so that the source file _is_ some kind of XML and not an hybrid that considers XML as text (which is the major source of confusion). If you have an XML workflow from the beginning, it should be much safer to keep it XML all the way hence: original = XML (the OOo dialect) diffs = XML (currently SDF, so shift to a dialect that uses the SDF info as attributes in XML "diffs" tags for ex) source = XML (XLIFF) reference = XML (TMX, taken from the original) TMX is not supported by most PO editors anyway, so a clean TMX would mostly benefit people who use appropriate translation tools (free ones included). Regarding the XLIFF (or PO, depending on the communities I gather) source output, each community (and even each contributor) could use the output that fits the tools in use. XLIFF should be 1.0 so as to ensure OLT can be used (OLT does not support more recent versions of XLIFF sadly). And then you have a clean workflow that satisfies everybody, and the management (Pootle) system can be put on all that to provide communities with the best environment possible. And of course, this workflow is also valid for UI strings, since I suppose they can also be converted to XML (if they are not already). What about that ? Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] OmegaT UTF-8 problem
HI, We're trying to use OmegaT in Simplified Chinese Windows XP, it can't save to UTF-8 but Chinese GBK. Could anyone help us? Thanks! --- Cheng Lin 2007-07-13 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Jean-Christophe Helary wrote: On 13 juil. 07, at 00:07, Uwe Fischer wrote: Jean-Christophe Helary wrote: ... To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. ... let me add some words to your message: Uwe, Thank you so much for your mail ! The application help files are XML files. See http:// documentation.openoffice.org/online_help/techdetails.html for details. The Help Viewer converts the XML files to HTML when they are displayed. Using XML, it should be even more easy to use a straight forward translation process without intermediate files. That is very good to know. There are already free generic XML filters that produce valid XLIFF: Okapi framework for ex, developed by Yves Savourel, also editor of the XLIFF 1.0 spec. Okapi is developed in .NET 2.0 but I keep asking Yves to make it compatible with Mono so that it can be used in other environments. As a side note, OmegaT's XLIFF filter has been made to specifically support Okapi's output. Today we have a 1:1 correspondence of paragraphs as the smallest units to be translated. Each paragraph has an ID number to ensure the correct mapping of translated text. This means that no localization with additional or removed parts of text is possible. Not 21st. century technology in my opinion. No, but that means that correct TMX files are a possibility (even now). By the way I wonder why Rafaella told me creating TMXs of the state of the strings before the current updates was impossible ? to clarify: the only possibility I have is to provide you TMX files in which translation exactly matches the English text now. If the English source has been changed I have following situation: New English text - Old translation (matching previous text). In the database I have no possibility to provide you with files containing Old English text and Updated English text. Rafaella - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Translating .sdf files directly with OmegaT
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jean-Christophe Helary ha scritto: > Ale, > > I was wondering if you eventually had considered this procedure. I works > very correctly and considerably increases productivity thanks to > OmegaT's HTML handling features. I think I'm going to investigate the > possibility of having an .sdf filter for OmegaT rather than having to go > through all the po loops that really don't provide much more than yet > another intermediate format that is anyway inconvenient to translate. > > JC Hi JC, sorry for the late reply. Actually I haven't tried going through the procedure you described, I think I'll give it a try with the next batch of files. We'll have around 4,200 words to translate and as it is a reasonable volume, I think I'll have some time to spend in testing a new procedure. What I fear, though, is that OmegaT would become extremely slow processing a huge SDF file. If I have a bunch of PO files I can just import only a few of them into the OmT project at a time and that makes it possible to translate without too much "CPU sweat" :o). When I tried loading the whole OLH project on which we worked in June, my computer was almost collapsing: it took me over an hour just to load the project! I don't have a powerful machine (AMD Athlon XP, 1500Mhz, 700MB RAM) but I think that if you have a big TM it is not wise to load a project with over a thousand segments. Maybe we could split the SDF file into smaller ones, but I'm not sure that would work. Ale. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlmeqdpk3ZlYYJ+gRAvkdAJ0QDPlrTEFH9XlHq6nEmt9fesTI1wCeNupc hQtk2dkFDePdx2UiybxiSVY= =/97n -END PGP SIGNATURE- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
On 13 juil. 07, at 00:07, Uwe Fischer wrote: Jean-Christophe Helary wrote: ... To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. ... let me add some words to your message: Uwe, Thank you so much for your mail ! The application help files are XML files. See http:// documentation.openoffice.org/online_help/techdetails.html for details. The Help Viewer converts the XML files to HTML when they are displayed. Using XML, it should be even more easy to use a straight forward translation process without intermediate files. That is very good to know. There are already free generic XML filters that produce valid XLIFF: Okapi framework for ex, developed by Yves Savourel, also editor of the XLIFF 1.0 spec. Okapi is developed in .NET 2.0 but I keep asking Yves to make it compatible with Mono so that it can be used in other environments. As a side note, OmegaT's XLIFF filter has been made to specifically support Okapi's output. Today we have a 1:1 correspondence of paragraphs as the smallest units to be translated. Each paragraph has an ID number to ensure the correct mapping of translated text. This means that no localization with additional or removed parts of text is possible. Not 21st. century technology in my opinion. No, but that means that correct TMX files are a possibility (even now). By the way I wonder why Rafaella told me creating TMXs of the state of the strings before the current updates was impossible ? Also, even with your static paragraph based linking, it is still possible to add "locale-specific" info within the translated paragraph. I don't see this as an issue. You could require added information to be tagged in a special inline tag for easy recognition on your side. We want to add a link to every help page to a corresponding Wiki page, where every user can add comments (or more). This will need some effort to re-sync the source files in CVS with the user contents from the Wiki. In all languages. Good ideas are welcome. I am not a developer so I quit using Wikis to update the OmegaT documentation. But if you have a system that correctly converts the mediawiki (?) to your XML, which I had not, then the only problem I see is merging the comments in languages other than English to get an homogenous source file set... And that is probably not trivial :) You may want to consider non-locale specific contents (comments on the translation itself that eventually will impact the original, and that may need to be translated to English) and locale specific comments (that will only impact that localization, no need to modify the sources, only the LC is concerned). Way past bed time ! Cheers, Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Hi, Jean-Christophe Helary wrote: ... To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. ... let me add some words to your message: The application help files are XML files. See http://documentation.openoffice.org/online_help/techdetails.html for details. The Help Viewer converts the XML files to HTML when they are displayed. Using XML, it should be even more easy to use a straight forward translation process without intermediate files. Today we have a 1:1 correspondence of paragraphs as the smallest units to be translated. Each paragraph has an ID number to ensure the correct mapping of translated text. This means that no localization with additional or removed parts of text is possible. Not 21st. century technology in my opinion. We want to add a link to every help page to a corresponding Wiki page, where every user can add comments (or more). This will need some effort to re-sync the source files in CVS with the user contents from the Wiki. In all languages. Good ideas are welcome. Uwe -- [EMAIL PROTECTED] - Technical Writer StarOffice - Sun Microsystems, Inc. - Hamburg, Germany http://www.sun.com/staroffice http://documentation.openoffice.org/online_help/index.html http://wiki.services.openoffice.org/wiki/Category:OnlineHelp http://blogs.sun.com/oootnt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] Translation Schedule for 2.3 - 2nd Handoff
Dear All, I hope you had a couple of days to recharge your batteries :-) . We are preparing the files to translate for the second - and last - handoff for the 2.3 release. Deadline is July 26th which is also the deadline for all translations to be sent back to Sun. We will use the same issues as listed at http://wiki.services.openoffice.org/wiki/Translation_for_2.3 to deliver the sdf files to translate for the following languages: French Italian German Swedish Spanish Dutch Brazilian Russian Simplified Chinese The volume of this second handoff is: GUI: approx. 2,000 words Online Help: approx. 4,200 words Before deliverying the files on July 26th please make sure to run the *latest* gsicheck which Gregor has just uploded at http://ooo.services.openoffice.org/gsicheck/ In case of errors you'll get an error text file which exactly tells you which line and which error needs to be corrected. Such errors can be fixed directly in the sdf file using a text editor. Thanks again for all your contribution and do not forget to send me your name address language you contributed preferred platform to get some giveaways Regards, Rafaella - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
On 12 juil. 07, at 20:29, Jean-Christophe Helary wrote: On 12 juil. 07, at 17:36, Rafaella Braconi wrote: However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Yes, because the problem is not the delivery format, it is the fact that you have 2 conversions from the HTML to the final format and the conversion processes are not clean. Similarly, the TMX you produce are not real TMX (at least not the one you sent me). I am not arguing that UI files would benefit from such treatment. I am really focusing on the HTML documentation. To make things even clearer, I am saying that using _any_ intermediary format for documentation is a waste of resources. If translators want to use intermediary formats to translate HTML in their favorite tool (be it PO, XLIFF or anything else) that is their business. Janice (NetBeans) confirmed me that NB was considering a Pootle server exclusively for UI files (currently Java properties files), but in the end that would mean overhead anyway since the current process takes the Java properties as they are for translation in OmegaT. In NB, the HTML documentation is available in packages corresponding to the modules, and the TMX (a real one...) allows to automatically get only the updated segments. No need for a complex infrastructure to produce differentials of the files, all this is managed by the translation tool automatically and _that_ allows the translator to have _much more_ leverage from the context and to benefit from a much greater choice of correspondances. I suppose the overhead caused by the addition of an intermediary format for the UI files will be balanced by the management functions offered by the new system, but I wish we did not have to go through translating yet another intermediate format for the simple reason that seeing the existing conversion processes (I've tried only the translate-toolkit stuff and it was flawed enough to convince me _not_ to use its output) is likely to break the existing TMX. If the management system were evolved enough to output the same Java properties files I am sure everybody would be happy. But, please, no more conversion than necessary. To go back to the OOo processes, I have no doubt that a powerful management system available to the community is required. But in the end, why is there a need to produce .sdf files ? Why can't we simply have HTML sets, like the NB project, that we'd translate with appropriately formed TMX files in appropriate tools ? My understanding from when I worked with Sun Translation Editor (when we were delivered .xlz files and before STE was released as OLT) is that we had to use XLIFF _because_ the .sdf format was obscure. But in the end, the discussion we are having now after many years of running in circles apparently) revolves not on how to ease the translator's work but on how to ease the management. If the purpose of all this is to increase the translators' output quality, then it would be _much_ better to consider a similar system that uses the HTML sets directly. Because _that_ would allow the translator to spend much more time on checking the translation in commonly available tools (a web browser...) How do you do checks on PO/XLIFF/SDF without resorting to hacks ? Keeping things simple _is_ the way to go. Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
On 12 juil. 07, at 17:36, Rafaella Braconi wrote: However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Yes, because the problem is not the delivery format, it is the fact that you have 2 conversions from the HTML to the final format and the conversion processes are not clean. Similarly, the TMX you produce are not real TMX (at least not the one you sent me). I am not arguing that UI files would benefit from such treatment. I am really focusing on the HTML documentation. Jean-Christophe Helary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] new version of gsicheck available for download
Thank you Gregor for providing a new updated version of the gsicheck! Can you all make sure you use the latest version before deliverying sdf files back to Sun? Thanks, Rafaella Gregor Hartmann wrote: Hi, I have now uploaded archives for windows, linux, solaris sparc and solaris intel. You can download them at http://ooo.services.openoffice.org/gsicheck/ always get the newest version (with the highest version number). The other ones are just there for reference. I will update them as new versions become available. To use them unpack them in a separate directory. Calling gsicheck without parameters will give you a short help for regular use call gsicheck more info in the wiki page http://wiki.services.openoffice.org/wiki/Gsicheck Regards Gregor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
F Wolff wrote: Op Donderdag 12-07-2007 om 10:36 uur [tijdzone +0200], schreef Rafaella Braconi: Hi Jean-Christophe, thank you once again for sharing your thoughts and experience. I am trying to reproduce and clarify with other engineers what you say here below. However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Regards, Rafaella Pootle has XLIFF functionality since version 1.0. Hopefully we can upgrade the version on the server soon. That's really great news! Thank you for sharing this with us. Rafaella F - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Op Donderdag 12-07-2007 om 10:36 uur [tijdzone +0200], schreef Rafaella Braconi: > Hi Jean-Christophe, > > thank you once again for sharing your thoughts and experience. > > I am trying to reproduce and clarify with other engineers what you say > here below. > > However, from what I understand here, the issue you see is not > necessarily Pootle but the format Pootle delivers which is .po. As > already said, Pootle will be able to deliver in near future the content > in xliff format. Would you still see a probelm with this? > > Regards, > Rafaella > Pootle has XLIFF functionality since version 1.0. Hopefully we can upgrade the version on the server soon. F - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [l10n-dev] Imagine :)
Hi Jean-Christophe, thank you once again for sharing your thoughts and experience. I am trying to reproduce and clarify with other engineers what you say here below. However, from what I understand here, the issue you see is not necessarily Pootle but the format Pootle delivers which is .po. As already said, Pootle will be able to deliver in near future the content in xliff format. Would you still see a probelm with this? Regards, Rafaella Jean-Christophe Helary wrote: I have no idea where the UI files come from and how they _must_ be processed before reaching the state of l10n source files. So, let me give a very simplified view of the Help files preparation for l10n as seen from a "pure" TMX+TMX supporting tool point of view. Since I don't know what the internal processes really are I can only guess and I may be mistaken. • The original Help files are English HTML file sets. • Each localization has a set of files that corresponds to the English HTML sets • The English and localized versions are sync'ed To create TMX files: Use a process that aligns each block level tag in the English set to the corresponding block level tag in the localized set. That is called paragraph (or block) segmentation and that what SUN does for NetBeans: no intermediary file format, no .sdf, no .po, no whatever between the Help sets and the TMX sets. The newly updated English Help files come as sets of files, all HTML. The process to translate, after the original TMX conversion above (only _ONE_ conversion in the whole process) is the following: Load the source file sets and the TMX sets in the tool. The HTML tags are automatically handled by the tool. The already translated segments are automatically translated by the tool. The translator only needs to focus on what has been updated. Using the whole translation memory as reference. Once the translation is done, the translator delivers the full set that is integrated in the release after proofreading etc. What is required from the source files provided side ? Creating TMX from HTML paragraph sets. What is required from the translator ? No conversion whatsoever, just work with the files and automatically update the translation with the legacy data. Now, what do we have currently ? The source files provider creates a differential of the new vs the old HTML set. It converts the result to an intermediate format (.sdf) It converts that result to yet another intermediate format for the translator (either .po or xliff) It matches the results of the diff strings to corresponding old localized strings, thus removing the real context of the old string It creates a false TMX based on an already intermediate format, without hiding the internal codes (no TMX level 2, all the tag info is handled as text data...) The translator is left to use intermediate files that have been converted twice, removing most relation to the original format and adding the probability of having problems with the back conversion. It has to work with a false TMX that has none of the original context, thus producing false matches that need to be guessed backward and that displays internal codes as text data. Do you see where the overhead is ? It is very possible that the UI files do require some sort of intermediate conversion to provide the translators with a manageable set of files, but as far as the Help files are concerned (and as far as I understand the process at hand) there is absolutely no need whatsoever to use an intermediate conversion, to remove the original context and to force the translator to use error prone source files. It is important to find ways to simplify the system so that more people can contribute, so that the source files provider has less tasks to handle, but clearly using a .po based process to translate HTML files is going totally the opposite way. And translators are (sadly without being conscious of that) suffering from that, which results into less time spend on checking one's translation and a general overhead for checkers and converters. Don't get me wrong, I am not ranting or anything, I _am_ really trying to convince people here that things could (and should) be drastically simplified, and for people who have some time, I encourage you to see how NetBeans manages its localization process. Because we are loosing a _huge_ amount of human resources in the current process. Cheers, Jean-Christophe Helary (fr team) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]