On Sun, 6 Jul 2003, Yong Li wrote: Hi Rigel,
Thanks for your kind comments. I fully agree with you on most, if not all, points. > 1. To my knowledge the gb18030.2000-0 and gb18030.2000-1 encodings are > invented by Sun and used in their Solaris 9. The only application on Linux As I wrote on bugzilla (I guess you wrote your reply before I added my latest comment to XF86 bugzilla), I got astray by the presence of gb18030.2000-1.enc file on my RH 8.0. I couldn't connect to the XF86 CVS and assumed that it's what XF86 has. It turned out that the file was RH-specific and had not been committed to XF86. > that supports them is Mozilla (maybe Java1.4 as well?) at the request of > Sun (see mozilla bug 72525). Mozilla's GB18030Font1 encoder (Unicode -> gb18030.2000-1) does not cover some 'single-width' (usually) characters such as Euro and Latin-1 chars (see http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/nsUnicodeToGBK.cpp#180 ). I believe this exclusion was done on purpose to avoid rendering those characters with 'double width' Chinese glyphs(there's another protection - built-in - against this in Mozilla code, though). This is another point where I was misled. I should have tracked down (a) bug(s) for which this encoder was added (as you have done.) > 4. The gb18030.2000-1.enc.gz file included in RedHat 9 is totally wired. > I can not figure out what it is. RH8's gb18030.2000-1.enc file (I don't know whehter it's the same as RH9's) appears to represent a straight identity mapping for a subset of BMP characters. Exactly what subset is covered I didn't bother to figure out. (All the Chinese characters and characters for Chinese minority scripts - Yi, Mongolian, etc - are included.) > IMHO, if you want to extend the system to add > such as gb18030.2000-2, it's probably a good idea to consult with Sun just > so that it will be compatible with any potential Sun's own extension. With my misunderstanding about what's suggested by Roland in XF86 bugzilla cleared, there's no need for that. I raised up a possibility of gb18030.2000-2 because I mistakenly thought that attachment 348 represents a new font encoding that is distinct from the existing gb18030.2000-1 (that I thought had been well-establisehd). If they're different and covers disjoin sets of characters, they need to have different names. As I wrote above, what Roland suggested had been used by Solaris and Mozilla (and very likely by Turbo Linux) while what I though was well-established turned out to be RH-specific. > Personally though I don't think the new font encoding is needed, as we are > rapidly moving away from the core font technologies (at least in the > XFree86 world). For any application that does support non-BMP characters, > most likely it already uses Xft/fontconfig anyway. Absolutely. I have no intention of extending the life of 10+ year old not-so-flexible XLFD-based font selection mechanism. The introduction of Xft/fontconfig is one of the best things that has happened to X11 (although fontconfig is not just for X11). > seems to be the requirement of GB18030 conformance test. The Standard > however have defined all the mappings between GB18030 and every code point > in UTF-16 space. It's unclear (to me at least) what exactly consist of > legal GB18030 codes. The attachment 348 seems included every BMP code > point that is not in gb18030.2000-0. I think sometimes it's useful to > know whether a code is a non-existent character or a legal code but not > exist in a certain font. So I suggest to remove the unassigned BMP code > points from that file. Hmm. that's an interesting point. I guess GB18030 is supposed to have exactly the same repertoire as ISO 10646. To keep it in sync with Unicode/10646 without playing a catch-up game with 10646/Unicode, it's better to cover all legal - assigned or not - code points also valid in GB18030. As you wrote, fewer and fewer people would bother themselves with X11 core fonts as time goes by.... > Also the "STARTMAPPING cmap 3 4" entry at the end > should be removed because it's obviously not an identical mapping. Yup. Perhaps, Roland just copied it from gb18030.2000-0 or gbk-0.enc. > 3. The gb18030.2000-0 file is probably not needed. Yes, it's true that the > two-byte codes in GB18030 are slightly different than GBK. There are 80 also > code points, that are mapped to PUA in GBK, got official assignments in later > Unicode standards and GB18030 adopted the new mappings. However that doesn't > mean gb18030.2000-0 uses the new mappings because Sun could opt to keep backward > compatibility with GBK fonts by making gb18030.2000-0 and gbk same. Judging by > the comments posted on Mozilla bugzilla by engineers from Sun it is probably > indeed the case (see e.g. bug 72525 and 81200). It would be nice if someone > from Sun could confirm this. Yes, that would be nice. Actually, that is what's done in RH 8.0 (gbk-0.enc file has a line making it an alias to gb18030.2000-0). Even if they're not made compatible, 80 characters with different code point assignment may as well be 'algorithmically' taken care of instead of adding a new encoding file. Jungshik _______________________________________________ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n