> Since the GB18030 is a superset of GBK and a PC compatibility
> standard of Chinese official, why we haven't a GB18030 encodeing
> scheme in CJK? If we have it, we can use it easily with:
> \begin{CJK}{GB18030}{song} or anthing alike.

I've just done another look onto GB 18030, and I'm more than ever
convinced that support for GB 18030 is not possible in an efficient
way.

Its design, to assure compatibility with GB 2312 and GBK, is extremely
ugly.  It splits the Unicode ranges into zillions of small blocks:
Consider the whole Unicode range and remove the code points of GBK.
All characters which remain are mapped to four-byte values like this:

       0x81308130 - 0x81308139
       0x81308230 - 0x81308239
       ...
       0x8130FE30 - 0x8130FE39

       0x81318130 - 0x81318139
       ...
       ...
       0xFE39FE30 - 0xFE39FE39

Each of these blocks contains 10 characters.  Now here some details
how the mapping looks like:

       U+00AF     0x81308534     MACRON
       U+00B0     0xA1E3         DEGREE SIGN
       U+00B1     0xA1C0         PLUS-MINUS SIGN
       U+00B2     0x81308535     SUPERSCRIPT TWO
       U+00B3     0x81308536     SUPERSCRIPT THREE
       U+00B4     0x81308537     ACUTE ACCENT
       U+00B5     0x81308538     MICRO SIGN
       U+00B6     0x81308539     PILCROW SIGN
       U+00B7     0xA1A4         MIDDLE DOT
       U+00B8     0x81308630     CEDILLA

It can be clearly seen how much the data is scattered.  As a
consequence, its virtually impossible to define a good encoding scheme
for GB 18030 which I could use within the CJK package.

My conclusion: No support for GB 18030.  Just use iconv or a similar
conversion tool to use UTF-8.


    Werner

_______________________________________________
Cjk maillist  -  [email protected]
http://lists.ffii.org/mailman/listinfo/cjk

Reply via email to