Re: [l10n-dev] escaping

2007-06-19 Thread Jean-Christophe Helary

OmegaT handles PO files pretty much as text files and thus does not

care about "\", for it, the "\" is just another character. Hence,
there is nothing that is generated by OmegaT in the screenshot I
showed. The files are displayed as they are.


Friedel,

I am not arguing for or against a certain way to display the data I  
am just saying that OmegaT does not do anything to the data. And  
considers the PO escapes as a "\" character.


Unfortunately a PO file isn't just a text file. It is a file format  
that

presents data in a specific way. To escape the slash
(\) and the quotes (") is part of the format that we try to conform  
to.


Which is very good and OmegaT does not interfere with that.




So, you see, the TMX does not exactly match the original .po file.
Although it does match the .sdf, but this is irrelevant.

When I created the TMX by using XLFEdit from Heartsome, I first too
the converted po, converted it to XLIFF and then exported it as TMX
and the TMX contained the same number of escapes as the po.


I would consider this behaviour by the Heartsome tool to be a bug,  
to be

honest. Do they convert '<' to '<' ? Then they should also convert
the rest. I would say this is part of the rules of data conversion
between these formats.

I believe our conversion conforms to the XLIFF representation guide  
for

PO files:
http://xliff-tools.freedesktop.org/snapshots/po-repr-guide/wd-xliff- 
profile-po.html#s.general_considerations.escapechars


I think it follows logically that the same rules should apply for
converting to TMX.


I have no idea who is right and who is wrong. What I can say is that  
Heartsome is _very_ strong when it comes to respecting standards.  
Besides, the document you quote has contributions from Rodolfo Raya  
who is also developer at Heartsome and who himself is extremely picky  
when it comes to standards compliance.


In "3.4.Handling of Escape Sequences in Software Messages", the text  
says, regarding a fragment that includes escape sequences like we  
have here: "This fragment could be presented in XLIFF by preserving  
the escape sequences:"


etc. Of course it proposes rules to handle special escape sequences  
as opposed to generic escape sequences but there is nothing wrong  
seemingly with keeping all the escape sequences.


What matters in the end is _not_ that the PO has been through an  
XLIFF conversion process or not.


What matter is that:

1) I have a source po with \\\
2) my reference TMX should match that with > because it is created from a similar po file

3) but for some reason it provides only \\

Let me repeat myself. I have no issue with your processes and with  
your level of compliance with the proposed standards.


The only problem is that somewhere, the TMX conversion process looses  
data and that impairs my ability to get leverage from it.


A somewhat separate issue for me is that the \< in the SDF file is  
also

an escape of that format. In reality it refers to just a left angular
bracket. The SDF format is however a bit strange in the way these are
used, and we might not want to change the way we handle the SDF  
escaping
while Pavel's POT files has a semi-official status. If we can agree  
how
we interpret the escaping in the SDF file and coordinate the  
change, we

can probably make the lives of translators far easier by eliminating
much of the escaping.


I don't think the problem is in the oo2po process. Whatever the  
result we are all starting from po anyway.


What is at stake here is that if I take a po created from .sdf and I  
use po2tmx on that same file, the data that the TMX contains is  
different from the data in the po.


JC

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [l10n-dev] escaping

2007-06-19 Thread F Wolff
On Di, 2007-06-19 at 00:44 +0900, Jean-Christophe Helary wrote:
> On 18 juin 07, at 23:28, F Wolff wrote:

...

> >
> > So this is the PO file, am I correct? Does OmegaT handle the  
> > escapes in
> > the PO file? Does the actual PO file (opened in a normal text editor)
> > have \\\" or something else? This would of course mean  \"  since both
> > the backslash and the double quote must be escaped in PO files (along
> > with newlines and tabs). In TMX they are of course put in unescaped as
> > \" since the escaping is not necessary for TMX.
> >
> > Would this explain what you are seeing?
> 
> OmegaT handles PO files pretty much as text files and thus does not  
> care about "\", for it, the "\" is just another character. Hence,  
> there is nothing that is generated by OmegaT in the screenshot I  
> showed. The files are displayed as they are.
> 

Unfortunately a PO file isn't just a text file. It is a file format that
presents data in a specific way. To escape the slash 
(\) and the quotes (") is part of the format that we try to conform to.

> To make sure I am not wrong, let me reproduce the process here with  
> an example string:
> 
> 1) the .sdf I have contains:
> 
> > helpcontent2source\text\sbasic\shared\01\0613.xhp   0   
> > help 
> > par_id3149124   20  0   en-US   To create a new 
> > macro, select the  
> > "Standard" module in the \Macro from\ list, and then  
> > click \New\. 2007-04-11 
> > 15:55:00.0
> > helpcontent2source\text\sbasic\shared\01\0613.xhp   0   
> > help 
> > par_id3149124   20  0   fr  Pour créer une 
> > nouvelle macro, sélectionnez  
> > le module "Standard" dans la liste
> 
> (lines 3 and 4 of HC2_93824_89_2007-06-05_33.sdf)
> 
> When I use oo2po (oo2po --language=fr --nonrecursiveinput  
> HC2_93824_89_2007-06-05_33.sdf HC.po), I get the following strings:
> 
> > #: 0613.xhp#par_id3149124.20.help.text
> > msgid ""
> > "To create a new macro, select the \"Standard\" module in the \ 
> > \Macro "
> > "from\\ list, and then click \\New\\. "
> > msgstr ""
> > "Pour créer une nouvelle macro, sélectionnez le module \"Standard\"  
> > dans la "
> > "liste \\Macro de\\ et cliquez sur \\ > \>Nouveau\\. "
> > "Vous pouvez également créer un nouveau module. Pour ce faire, s"
> > "électionnez-le dans la liste \\Macro de\\ et  
> > cliquez sur "
> > "\\Nouveau\\."
> 
> You can see that a number of characters have been escaped.
> 
> 
> Now, when I create a TMX from this file (even though I know this file  
> is a pseudo translation) ($ po2tmx --language=fr HC.po HC.tmx), I get:
> 
> > 
> > To create a new macro, select the "Standard" module  
> > in the \Macro from\ list, and then click  
> > \New\. 
> > 
> > 
> > Pour créer une nouvelle macro, sélectionnez le module  
> > "Standard" dans la liste \Macro de\ > \> et cliquez sur \Nouveau\. Vous  
> > pouvez également créer un nouveau module. Pour ce faire,  
> > sélectionnez-le dans la liste \Macro de\  
> > et cliquez sur \Nouveau\.
> > 
> 
> 
> So, you see, the TMX does not exactly match the original .po file.  
> Although it does match the .sdf, but this is irrelevant.
> 
> When I created the TMX by using XLFEdit from Heartsome, I first too  
> the converted po, converted it to XLIFF and then exported it as TMX  
> and the TMX contained the same number of escapes as the po.

I would consider this behaviour by the Heartsome tool to be a bug, to be
honest. Do they convert '<' to '<' ? Then they should also convert
the rest. I would say this is part of the rules of data conversion
between these formats.

I believe our conversion conforms to the XLIFF representation guide for
PO files:
http://xliff-tools.freedesktop.org/snapshots/po-repr-guide/wd-xliff-profile-po.html#s.general_considerations.escapechars

I think it follows logically that the same rules should apply for
converting to TMX.

> 
> > Well, not when converted to an XML based type, I would say. In the  
> > same
> > way a left angular bracket (<) can be put normally (unescaped) in a PO
> > file, but in TMX it would have to go in as <
> 
> Now, whatever is required or not in an XML document is not relevant  
> here. What I need is that created TMX contents match exactly my  
> source content otherwise I am going to edit each and every segment to  
> add escapes so that my target matches my source... Which is defeating  
> the point of using a TMX file. If the .po file contains 3 "\" and if  
> I created a TMX with a .po that has 3 "\" I want the TMX to contain  
> the 3 "\". Otherwise it is not useful at all anymore.
> 
> JC

With the < I was just trying to explain why things might differ
between two different data formats with an example that is perhaps
slightly more well known because of its use in HTML.


A somewhat separate issue for me is