(I droped Lucas, as it becomes out of topic related to the FTBS. Osamu
and I are subscribed to the bug.)

Hi Danai,

On Thu, Nov 09, 2006 at 03:34:15AM +0100, Danai SAE-HAN wrote:
> Okay, I checked out the SGML-generated .tex file, and it seems very
> much that Perl or something else misinterprets some of the CJK
> characters.
> 
> The command I used to create the .tex file was:
> debiandoc2latex -l zh_TW.Big5 reference.zh-tw.sgml
> 
> Examples of these misinterpreted characters:
> 
> in titledoc.sgml: 程 -> 琵{
> 
> in preface.sgml:  開 -> 跚}
>                   閱 -> 閱textbackslash{}
>                   誤 -> 蓋textasciitilde{}

Misinterprets? I remember that this was intentional! Extract from
fixlatex script which is applied after the SGML/Perl magic:

("zh_TW")
        perl -p \
-e 's/([\x80-\xff])\\textbackslash\{\}/$1\\/g;' \
-e 's/([\x80-\xff])\\textasciitilde\{\}/$1\~/g;' \
-e 's/([\x80-\xff])\\textasciicircum\{\}/$1\^/g;' \
-e 's/([\x80-\xff])\\\}/$1\}/g;' \
-e 's/([\x80-\xff])\\\{/$1\{/g;' \
-e 's/([\x80-\xff])\\\_/$1_/g;' <$1 |
        bg5conv >$2

That's the only code I'm aware of which substitutes characters. I have
to confess that the code is cryptic and I do not fully understand it.
But it was provided by a Chinese person and introduced to support zh_TW
documents and building did not worked without in the past.
The funny thing is that I also looked into the bg5conv code and noticed
that it does also very similar tasks (only a few substitutions). I once
tried to merge both so that the substitutions can be called together on
pieces of text inside the LaTeX processor (and not afterwards for speed
up and simplicity) but ...

Please note that debiandoc-sgml is (more or less) language independent. The
general code is tools/lib/Format/LaTeX.pm which embeds a few (really, only
a few) language specific settings such as the language name required for babel
(if used at all). I added also the general 'before begin document',
'after begin document' and 'before end document' hooks, which are used
by Asian languages to embed CJK. Example from
tools/lib/Locale/zh_TW.Big5/LaTeX:

 'pdfhyperref' => 'CJKbookmarks',
 'before begin document' => '\\usepackage{CJK}',
 'after begin document' => '\\begin{CJK}{Bg5}{kai}
       \\renewcommand{\\vpageref}[1]{\(²Ä \\pageref{#1} ­¶\)}',
 'before end document' => '\\clearpage
   \\end{CJK}'

That's the complete magic.
 
> And perhaps a few other characters.  I find it strange that not all
> characters suffer from this problem, but only some.  Perhaps Perl
> doesn't have a complete map of the Big5 encoding?
> 
> A quick fix would perhaps be to put everything in a sed script.

I would prefer the opposite: deleting the above substitutions. Can you
please check these? Are they still necessary?

BTW: Are you a Chinese? Or do you also have to guess whats right and
what not?

What about the other languages? Japanese doesn't substitute strings but
failed with the same error that [EMAIL PROTECTED] was not defined ...
 
> Because of the many \text* commands it's best to keep using "bg5latex"
> instead of just "latex", which apparently doesn't work.  Fixing the
> FTBS is much more important now ATM, so let's stick to bg5latex.

But it worked in the past. Do you still think it is necessary to use
bg5latex? This would make just more code language specific.
 
> And on a sidenote, in zh-tw/install.sgml "Windwos 95/98/Me" should
> become "Windows 95/98/Me" of course.

I will fix it, thanks.

I appreciate your hints, please continue,
 
Jens

Reply via email to