On Di, 2007-06-19 at 00:44 +0900, Jean-Christophe Helary wrote:
> On 18 juin 07, at 23:28, F Wolff wrote:
...
> >
> > So this is the PO file, am I correct? Does OmegaT handle the
> > escapes in
> > the PO file? Does the actual PO file (opened in a normal text editor)
> > have \\\" or something else? This would of course mean \" since both
> > the backslash and the double quote must be escaped in PO files (along
> > with newlines and tabs). In TMX they are of course put in unescaped as
> > \" since the escaping is not necessary for TMX.
> >
> > Would this explain what you are seeing?
>
> OmegaT handles PO files pretty much as text files and thus does not
> care about "\", for it, the "\" is just another character. Hence,
> there is nothing that is generated by OmegaT in the screenshot I
> showed. The files are displayed as they are.
>
Unfortunately a PO file isn't just a text file. It is a file format that
presents data in a specific way. To escape the slash
(\) and the quotes (") is part of the format that we try to conform to.
> To make sure I am not wrong, let me reproduce the process here with
> an example string:
>
> 1) the .sdf I have contains:
>
> > helpcontent2source\text\sbasic\shared\01\0613.xhp 0
> > help
> > par_id3149124 20 0 en-US To create a new
> > macro, select the
> > "Standard" module in the \Macro from\ list, and then
> > click \New\. 2007-04-11
> > 15:55:00.0
> > helpcontent2source\text\sbasic\shared\01\0613.xhp 0
> > help
> > par_id3149124 20 0 fr Pour créer une
> > nouvelle macro, sélectionnez
> > le module "Standard" dans la liste
>
> (lines 3 and 4 of HC2_93824_89_2007-06-05_33.sdf)
>
> When I use oo2po (oo2po --language=fr --nonrecursiveinput
> HC2_93824_89_2007-06-05_33.sdf HC.po), I get the following strings:
>
> > #: 0613.xhp#par_id3149124.20.help.text
> > msgid ""
> > "To create a new macro, select the \"Standard\" module in the \
> > \Macro "
> > "from\\ list, and then click \\New\\. "
> > msgstr ""
> > "Pour créer une nouvelle macro, sélectionnez le module \"Standard\"
> > dans la "
> > "liste \\Macro de\\ et cliquez sur \\ > \>Nouveau\\. "
> > "Vous pouvez également créer un nouveau module. Pour ce faire, s"
> > "électionnez-le dans la liste \\Macro de\\ et
> > cliquez sur "
> > "\\Nouveau\\."
>
> You can see that a number of characters have been escaped.
>
>
> Now, when I create a TMX from this file (even though I know this file
> is a pseudo translation) ($ po2tmx --language=fr HC.po HC.tmx), I get:
>
> >
> > To create a new macro, select the "Standard" module
> > in the \Macro from\ list, and then click
> > \New\.
> >
> >
> > Pour créer une nouvelle macro, sélectionnez le module
> > "Standard" dans la liste \Macro de\ > \> et cliquez sur \Nouveau\. Vous
> > pouvez également créer un nouveau module. Pour ce faire,
> > sélectionnez-le dans la liste \Macro de\
> > et cliquez sur \Nouveau\.
> >
>
>
> So, you see, the TMX does not exactly match the original .po file.
> Although it does match the .sdf, but this is irrelevant.
>
> When I created the TMX by using XLFEdit from Heartsome, I first too
> the converted po, converted it to XLIFF and then exported it as TMX
> and the TMX contained the same number of escapes as the po.
I would consider this behaviour by the Heartsome tool to be a bug, to be
honest. Do they convert '<' to '<' ? Then they should also convert
the rest. I would say this is part of the rules of data conversion
between these formats.
I believe our conversion conforms to the XLIFF representation guide for
PO files:
http://xliff-tools.freedesktop.org/snapshots/po-repr-guide/wd-xliff-profile-po.html#s.general_considerations.escapechars
I think it follows logically that the same rules should apply for
converting to TMX.
>
> > Well, not when converted to an XML based type, I would say. In the
> > same
> > way a left angular bracket (<) can be put normally (unescaped) in a PO
> > file, but in TMX it would have to go in as <
>
> Now, whatever is required or not in an XML document is not relevant
> here. What I need is that created TMX contents match exactly my
> source content otherwise I am going to edit each and every segment to
> add escapes so that my target matches my source... Which is defeating
> the point of using a TMX file. If the .po file contains 3 "\" and if
> I created a TMX with a .po that has 3 "\" I want the TMX to contain
> the 3 "\". Otherwise it is not useful at all anymore.
>
> JC
With the < I was just trying to explain why things might differ
between two different data formats with an example that is perhaps
slightly more well known because of its use in HTML.
A somewhat separate issue for me is