I am coveying some background information from John Jenkins (who is on the IRG), slightly condensed:
========== 1) Presence in the G source is *not* an indication of being either "Chinese" or "simplified Chinese." The G source is the source which was used to include all of the KangXi, for example, which is hardly "simplified Chinese." There are also some traditional characters used for Cantonese, and Korean characters in there. The IRG sources are indications of who *asked* for a character to be included, and *not* an indication of "what kind of character" is involved. Excluding G-source-only characters on the presumption that they're SC would be a mistake. 2) There is still the assumption being made that one can look at a character and say, "Ah, yes, this is SC" or "Ah, yes, this is TC." It is impossible to separate Chinese from Japanese and Korean cleanly since they use the same characters. Also, given the significant percentage of characters which are *both* traditional forms in their own right *and* simplifications of other characters, the whole process is extremely problematic. You *cannot* from the 10646 data, or the IRG data, or presence in charset mappings extrapolate whether a particular ideograph is SC or TC. You must have knowledge of the individual characters. Mark ————— Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Adam M. Costello" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, February 04, 2002 21:33 Subject: Re: [idn] Re: peanut gallery > I wrote: > > > Here's a more precise version of the proposal: Prohibit a Han code > > point iff it has a kTraditionalVariant and has only "G" sources > > (China/Singapore). Would that do what I intend? > > "Mark Davis (jtcsv)" <[EMAIL PROTECTED]> replied: > > > No, that wouldn't do as you intend. The kTraditionalVariant is not a > > normative field... while it has improved over time, I would not put > > any real weight on it without a thorough review. > > Okay then, how about this: Prohibit a Han code point iff it has only > "G" sources (China/Singapore). > > That might prohibit a few characters unnecessarily, but it will make > sure that Taiwan, Japan, and Korea are able to use all their characters, > and will leave the maximum flexibility for China & Singapore to define > how to fold the remaining characters if they decide that's what they > want to do. > > I wouldn't recommend this course, but if most of the Chinese community > wanted to do this, I don't see why the rest of us should object. > > AMC > >
