Re: [Cjk] CJKvert and horizontal dash transformation?

Gernot Hassenpflug Tue, 24 Jul 2007 02:00:42 -0700

Werner LEMBERG <[EMAIL PROTECTED]> writes:
/../

Thank you for the answers, I managed to find my way around better,
with the help of my rather old version of Japanese Information
Processing and the unicode mapping files on ftp.unicode.org plus the
CJK documentation and source code.


>> Question 3: in essence, what I am now faced with is
>>  updating/expanding the unicode subfonts with the help of the files
>>  you advised me to look at.
>
> Not the subfonts itself, but the entries in the .fdx files so that the
> lines refer to the correct subfont and glyph index positions.

OK, here is what I have done up to now:

1. I use c42min.fd and JISdnp.enc to work out what the original JIS
   point was from the DNP symbol subfont glyph position. I know the
   JIS is encoded in EUC (or DNP) not in JIS encoding, but I don't
   know exactly which EUC. BTW, EUC is also known as UJIS, that is,
   Unixized JIS.

2. I could not understand yet exactly which EUC is used, so I assumed
   the complete two-byte format EUC for now, based on the fact that
   the first byte values seem to match the example you gave me (A1A1,
   assumed to be unbreaking space). This form is apparently not
   commonly encountered, according to my old reference. How times have
   changed :-)

3. Now, I realize that the conversion of subtracting -160 from the
   second byte, and making the first byte A1, is in fact the KUTEN <->
   JIS conversion! That is, whatever the DNP coding might be elsewhere
   (I did not check), it appears to be the KUTEN index encoding for at
   least the sy subfont. That is very helpful for lookup in tables!
   A1A1 subtracting 160 from each byte gives 01 01 which is row 1 (KU)
   and symbol 01 (TEN) in the KUTEN index system. From KUTEN to JIS
   involves on the other hand addition of decimal 32 to each byte.

4. Thus armed, I wrote a set of shell scripts to take the c42min.fd
   file as input and output the JIS, KUTEN, and EUC decimal and
   hexadecimal points, and also the mapping to UTF-8.  The JIS0208.txt
   file from ftp.unicode.org has a JIS encoding column, and no
   EUC. Tips on how to do this process more easily much appreciated.

5. Next, I set the subfont name for unicode, from the information in
   the Unicode.sfd file. The subfont name is in hexadecimal; however,
   the unicode second byte needs to be transformed into decimal for
   the .fdx file, so:

%% attempt to make a unicode c70min.fdx file

\def\fileversion{4.6.0}
\def\filedate{2005/08/11}
\ProvidesFile{c70min.fdx}[\filedate\space\fileversion]


\CJKvdef{rotate}{}
\CJKvdef{offset}{.5em}

%%            HEX/DEC
\CJKvdef{m/n/30/1}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{1}\hss}}
\CJKvdef{m/n/30/2}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{2}\hss}}
\CJKvdef{m/n/ff/12}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{12}\hss}}
\CJKvdef{m/n/ff/14}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{14}\hss}}
\CJKvdef{m/n/30/252}{\CJKsymbolsimple{252}}
\CJKvdef{m/n/30/28}{\CJKsymbolsimple{28}}
\CJKvdef{m/n/20/38}{\CJKsymbolsimple{38}}
\CJKvdef{m/n/20/37}{\CJKsymbolsimple{37}}
\CJKvdef{m/n/ff/8}{\CJKsymbolsimple{8}}
\CJKvdef{m/n/ff/9}{\CJKsymbolsimple{9}}
\CJKvdef{m/n/30/20}{\CJKsymbolsimple{20}}
\CJKvdef{m/n/30/21}{\CJKsymbolsimple{21}}
\CJKvdef{m/n/ff/59}{\CJKsymbolsimple{59}}
\CJKvdef{m/n/ff/61}{\CJKsymbolsimple{61}}
\CJKvdef{m/n/ff/91}{\CJKsymbolsimple{91}}
\CJKvdef{m/n/ff/93}{\CJKsymbolsimple{93}}
\CJKvdef{m/n/30/8}{\CJKsymbolsimple{8}}
\CJKvdef{m/n/30/9}{\CJKsymbolsimple{9}}
\CJKvdef{m/n/30/10}{\CJKsymbolsimple{10}}
\CJKvdef{m/n/30/11}{\CJKsymbolsimple{11}}
\CJKvdef{m/n/30/12}{\CJKsymbolsimple{12}}
\CJKvdef{m/n/30/13}{\CJKsymbolsimple{13}}
\CJKvdef{m/n/30/14}{\CJKsymbolsimple{14}}
\CJKvdef{m/n/30/15}{\CJKsymbolsimple{15}}

\CJKvlet{bx/n/30/1}  {m/n/30/1}
\CJKvlet{bx/n/30/2}  {m/n/30/2}
\CJKvlet{bx/n/ff/12} {m/n/ff/12}
\CJKvlet{bx/n/ff/14} {m/n/ff/14}
\CJKvlet{bx/n/30/252}{m/n/30/252}
\CJKvlet{bx/n/30/28} {m/n/30/28}
\CJKvlet{bx/n/20/38} {m/n/20/38}
\CJKvlet{bx/n/20/37} {m/n/20/37}
\CJKvlet{bx/n/ff/8}  {m/n/ff/8}
\CJKvlet{bx/n/ff/9}  {m/n/ff/9}
\CJKvlet{bx/n/30/20} {m/n/30/20}
\CJKvlet{bx/n/30/21} {m/n/30/21}
\CJKvlet{bx/n/ff/59} {m/n/ff/59}
\CJKvlet{bx/n/ff/61} {m/n/ff/61}
\CJKvlet{bx/n/ff/91} {m/n/ff/91}
\CJKvlet{bx/n/ff/93} {m/n/ff/93}
\CJKvlet{bx/n/30/8}  {m/n/30/8}
\CJKvlet{bx/n/30/9}  {m/n/30/9}
\CJKvlet{bx/n/30/10} {m/n/30/10}
\CJKvlet{bx/n/30/11} {m/n/30/11}
\CJKvlet{bx/n/30/12} {m/n/30/12}
\CJKvlet{bx/n/30/13} {m/n/30/13}
\CJKvlet{bx/n/30/14} {m/n/30/14}
\CJKvlet{bx/n/30/15} {m/n/30/15}

%% original contents of the c42min.fdx file
%% \CJKvdef{m/n/sy/2}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{2}\hss}}
- Show quoted text -
%% \CJKvdef{m/n/sy/3}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{3}\hss}}
%% \CJKvdef{m/n/sy/4}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{4}\hss}}
%% \CJKvdef{m/n/sy/5}{\raise .55em \hbox to 1em {\kern -.6em \CJKsymbol{5}\hss}}
%% \CJKvdef{m/n/sy/28}{\CJKsymbolsimple{28}}
%% \CJKvdef{m/n/sy/33}{\CJKsymbolsimple{33}}
%% \CJKvdef{m/n/sy/36}{\CJKsymbolsimple{36}}
%% \CJKvdef{m/n/sy/37}{\CJKsymbolsimple{37}}
%% \CJKvdef{m/n/sy/42}{\CJKsymbolsimple{42}}
%% \CJKvdef{m/n/sy/43}{\CJKsymbolsimple{43}}
%% \CJKvdef{m/n/sy/44}{\CJKsymbolsimple{44}}
%% \CJKvdef{m/n/sy/45}{\CJKsymbolsimple{45}}
%% \CJKvdef{m/n/sy/46}{\CJKsymbolsimple{46}}
%% \CJKvdef{m/n/sy/47}{\CJKsymbolsimple{47}}
%% \CJKvdef{m/n/sy/48}{\CJKsymbolsimple{48}}
%% \CJKvdef{m/n/sy/49}{\CJKsymbolsimple{49}}
%% \CJKvdef{m/n/sy/50}{\CJKsymbolsimple{50}}
%% \CJKvdef{m/n/sy/51}{\CJKsymbolsimple{51}}
%% \CJKvdef{m/n/sy/52}{\CJKsymbolsimple{52}}
%% \CJKvdef{m/n/sy/53}{\CJKsymbolsimple{53}}
%% \CJKvdef{m/n/sy/54}{\CJKsymbolsimple{54}}
%% \CJKvdef{m/n/sy/55}{\CJKsymbolsimple{55}}
%% \CJKvdef{m/n/sy/56}{\CJKsymbolsimple{56}}
%% \CJKvdef{m/n/sy/57}{\CJKsymbolsimple{57}}

%% \CJKvlet{bx/n/sy/2}{m/n/sy/2}
%% \CJKvlet{bx/n/sy/3}{m/n/sy/3}
%% \CJKvlet{bx/n/sy/4}{m/n/sy/4}
%% \CJKvlet{bx/n/sy/5}{m/n/sy/5}
%% \CJKvlet{bx/n/sy/28}{m/n/sy/28}
%% \CJKvlet{bx/n/sy/33}{m/n/sy/33}
%% \CJKvlet{bx/n/sy/36}{m/n/sy/36}
%% \CJKvlet{bx/n/sy/37}{m/n/sy/37}
%% \CJKvlet{bx/n/sy/42}{m/n/sy/42}
%% \CJKvlet{bx/n/sy/43}{m/n/sy/43}
%% \CJKvlet{bx/n/sy/44}{m/n/sy/44}
%% \CJKvlet{bx/n/sy/45}{m/n/sy/45}
%% \CJKvlet{bx/n/sy/46}{m/n/sy/46}
%% \CJKvlet{bx/n/sy/47}{m/n/sy/47}
%% \CJKvlet{bx/n/sy/48}{m/n/sy/48}
%% \CJKvlet{bx/n/sy/49}{m/n/sy/49}
%% \CJKvlet{bx/n/sy/50}{m/n/sy/50}
%% \CJKvlet{bx/n/sy/51}{m/n/sy/51}
%% \CJKvlet{bx/n/sy/52}{m/n/sy/52}
%% \CJKvlet{bx/n/sy/53}{m/n/sy/53}
%% \CJKvlet{bx/n/sy/54}{m/n/sy/54}
%% \CJKvlet{bx/n/sy/55}{m/n/sy/55}
%% \CJKvlet{bx/n/sy/56}{m/n/sy/56}
%% \CJKvlet{bx/n/sy/57}{m/n/sy/57}

\endinput

6. I tested this using the long hyphen, and it works (character
   30/252), so I am convinced from a practical standpoint. However,
   looking at the CJKvert.sty I am confused, since \CJKsymbolsimple
   takes only one argument, and I do not understand how it
   differentiates between different subfonts from line to line. I
   don't notice anything in the style file, and from makeuniwada.pl I
   only note that all the unicode subfonts are already happily
   created. Any comments?

7. If the above is right, it can be added to the gothic and maru .fdx
   files as well, right?

8. Next, I would like to help in creating support for half-width
   katakana in UTF-8 encoding too. Is this at all feasible at present?

Any advice much appreciated.

Regards,
     Gernot
-- 
開心 - 好運気 (Kai Xin - Hao Yun Qi)
Be happy and joyful - and share that joy with others


_______________________________________________
Cjk maillist  -  [email protected]
https://lists.ffii.org/mailman/listinfo/cjk

Re: [Cjk] CJKvert and horizontal dash transformation?

Reply via email to