> Since the GB18030 is a superset of GBK and a PC compatibility
> standard of Chinese official, why we haven't a GB18030 encodeing
> scheme in CJK? If we have it, we can use it easily with:
> \begin{CJK}{GB18030}{song} or anthing alike.
I've just done another look onto GB 18030, and I'm more than ever
convinced that support for GB 18030 is not possible in an efficient
way.
Its design, to assure compatibility with GB 2312 and GBK, is extremely
ugly. It splits the Unicode ranges into zillions of small blocks:
Consider the whole Unicode range and remove the code points of GBK.
All characters which remain are mapped to four-byte values like this:
0x81308130 - 0x81308139
0x81308230 - 0x81308239
...
0x8130FE30 - 0x8130FE39
0x81318130 - 0x81318139
...
...
0xFE39FE30 - 0xFE39FE39
Each of these blocks contains 10 characters. Now here some details
how the mapping looks like:
U+00AF 0x81308534 MACRON
U+00B0 0xA1E3 DEGREE SIGN
U+00B1 0xA1C0 PLUS-MINUS SIGN
U+00B2 0x81308535 SUPERSCRIPT TWO
U+00B3 0x81308536 SUPERSCRIPT THREE
U+00B4 0x81308537 ACUTE ACCENT
U+00B5 0x81308538 MICRO SIGN
U+00B6 0x81308539 PILCROW SIGN
U+00B7 0xA1A4 MIDDLE DOT
U+00B8 0x81308630 CEDILLA
It can be clearly seen how much the data is scattered. As a
consequence, its virtually impossible to define a good encoding scheme
for GB 18030 which I could use within the CJK package.
My conclusion: No support for GB 18030. Just use iconv or a similar
conversion tool to use UTF-8.
Werner
_______________________________________________
Cjk maillist - [email protected]
http://lists.ffii.org/mailman/listinfo/cjk