Hi all,

This may of be interest to some people.  It's an English-language summary
of GB18030 by Dirk Meyer of Adobe.

Looks like it basically makes GBK catch up with Unicode 3.0 by adding a
4-byte extension.


Thomas Chan
[EMAIL PROTECTED]


---------- Forwarded message ----------
Date: Fri, 13 Oct 2000 09:57:00 -0800 (GMT-0800)
From: Markus Scherer <[EMAIL PROTECTED]>
To: Unicode List <[email protected]>
Subject: GB18030 summary and issues

Dear Uni-encoders and -decoders,

Dirk Meyer from Adobe has put together an extensive summary of the chinese GB 
18030 encoding standard that was published on 2000-mar-17. Ken Lunde and I 
assisted Dirk with reviews and comments.

The summary is on the web site of Ken's famous CJKV book "with the fish":
ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf

To summarize the summary, we now have an english text describing the new 
encoding in its details. There are a few apparent errors, typos, and 
inconsistencies in the chinese standard text that need to be resolved.

For implementers, there is enough information in the summary to describe the 
encoding structure and to prepare an implementation.

What is still missing - aside from the resolution of the issues mentioned here 
- is a precise mapping table for how to map between at least the one-byte and 
two-byte portions of GB 18030 to and from Unicode.
In theory, it should be almost the same as GBK, but to be sure, we need 
precise, complete, and machine-readable mappings.
Given the one-byte and two-byte portions and the description in the standard 
and in the summary, the four-byte portion can be derived with a little bit of 
Perl or similar.

Anyone who needs to implement or know about GB 18030 should probably read this 
text.

Anyone who can contribute precise mapping tables and/or can help resolving the 
open issues please do so.


Best regards,

markus




回复