Hi, On Tue, 20 Dec 2005 14:26:08 +0800 Chia-I Wu <[EMAIL PROTECTED]> wrote: >> My anxiety is that: if we write a documentation including PUA >> charcode today, and read it after the official inclusion of the >> characters... we cannot search a string without the extra mapping >> table of PUA code and Unicode codepoint. And, we need a switch
>I think it can be a feature of a software for the HK users. It's like, >for example, I have a document with mixed traditional and simplified >Chinese. When I search for the character U+967D ("Sun" in traditional >Chinese) , I would like to see that U+9633 ("Sun" in simplified Chinese) >is also searched. The mappings are too complicated and too specific to >be included in a base library. (Or maybe not?) Thank you for giving concrete example, please let me ask about the case of Hanzi migration from PUA to official Unicode inclusion. Although CJK people know the difference between U+967D and U+9633 is just their presentation forms, the meaning is same. But Unicode define them as different characters. So, base layer of Unicode text handling should deal them as different Hanzi. But, this case, the relashionship between Hanzi in PUA codepoints and defined codepoints (in revised Unicode after inclusion of new Hanzi) is not clear. Excuse me, let me explain peskily. I think it's helpful for non-CJK people to join discussion. -- GB18030-2000 has several punctuations in vertical forms, and provides codepoint mapping from GB18030-2000 codepoint to Unicode-3.0 codepoint. Some vertical glyphs are already included in Unicode's official CJK compatibility area (U+FE30 - U+FE4F), but others are not included yet, so GB18030-2000 map them to Unicode-3.0 PUA codepoints. For example, GB+A6D9 was mapped to U+E78D. Since Unicode-4.1, the left vertical glyphs of GB18030-2000 are included in Unicode's official vertical form area (U+FF10 - U+FE1F). For example, GB+A6D9 is now mapped to U+FE10. Now I have a question. Today, we have Unicode-4.1.0. The character at PUA codepoint U+E78D is same with U+FF10? They are different? Now U+E78D should be dealt as unmapped? or should be kept for backwards compatibility? The decision should be done by font selection only? -- Regards, mpsuzuki _______________________________________________ gtk-i18n-list mailing list gtk-i18n-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-i18n-list