Re: Simplified Chinese radical set in Unihan
On Dec 16, 2004, at 3:20 PM, Tom Emerson wrote: Ah, I don't have my copy of the Comprehensive ABC here at home with me. If you have Wenlin, you have it in electronic form. Wenlin does the typesetting (and sub-licensing) for ABC, and the ABC data is accessible from within the Wenlin app. But on the subject of a Simplified Chinese radical set for Unihan: Please see the new field kHDZRadBreak coming in the Unihan 4.1 beta. This field shows a way to add additional radical info to Unihan. That is, for a lexical kSource in Unihan, one can associate kSource mappings with radical transitions. The Hanyu Da Zidian radical set is in fact a simplification of Kang Xi, though not one using simplified characters. When lexical mappings for a good simplified PRC lexicon are included in Unihan, a similar table can be built. We've got mapping and pinyin data for all of Xiandai Hanyu Cidian, accepted by UTC for inclusion in future Unihan. This will hopefully be added to Unihan in the coming year (pending final proofing).
Re: Simplified Chinese radical set in Unihan
John H. Jenkins writes: As you say, the main problem is that there are so many different possible sets. Some will be proprietary, which would limit their [...] IMHO the CASS radical set would be the one to use for this. It has the support of a major national body and is used in several dictionaries, including Xiandai Hanyu Cidian and John DeFrancis' ABC Dictionary. I expect too that generating a CASS radical index could be done automatically using the kRSKangXi field in Unihan and the mapping table between CASS and KangXi radical numbers in DeFrancis - kRSCASS would make sense. Are all of the CASS radicals themselves encoded in Unicode? -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com Beware the lollipop of mediocrity: lick it once and you suck forever
Re: Simplified Chinese radical set in Unihan
IMHO the CASS radical set would be the one to use for this. It has the support of a major national body and is used in several dictionaries, including Xiandai Hanyu Cidian and John DeFrancis' ABC Dictionary. That's what I was thinking, although the lates ABC dictionary, the Comprehensive, no long uses the CASS table but KangXi with extensions to handle simplified characters. The order of the CASS table used in ABC also differs from the most recent editions of the Xinhua Zidian and Xiandai Hanyu Cidian. However, I think once characters are mapped to the 189 CASS radicals, switching around the order would be easy. I expect too that generating a CASS radical index could be done automatically using the kRSKangXi field in Unihan and the mapping table between CASS and KangXi radical numbers in DeFrancis - kRSCASS would make sense. I've been working on a program like you describe. I think it will handle the majority of characters, but in a significant percentage of cases the mapping is not direct and will need to be hand-corrected. Are all of the CASS radicals themselves encoded in Unicode? I'm not sure and I'm not an expert, but from my experience it seems so. Erik
Re: Simplified Chinese radical set in Unihan
As you say, the main problem is that there are so many different possible sets. Some will be proprietary, which would limit their usefulness although there would, I believe, otherwise be no objection to its inclusion. If you can come up with a reasonably standard set and reasonably consistent data across several dictionaries referencing it, I'm sure there'd be no objection to including it. On Dec 16, 2004, at 2:19 PM, Erik Peterson wrote: Hello, I've found many uses for the UniHan data file the past few years. It's a great source of information. One potential addition that I've wanted is a field listing the simplified Chinese radical for at least the simplified Chinese characters, like what exists for the Xinhua Zidian (Xinhua Dictionary) and other mainland Chinese dictionaries. I was wondering if this has been discussed before? Some potential difficulties I could see include the fact that mainland dictionaries use a variety of different radical schemes. The most standard one that I can find is the Chinese Academy of Social Sciences (CASS) set with 189 different radicals. Even for dictionaries that use this set the ordering is often different. Could the radical set also be proprietary in some way? Anyway, I was curious. I've been working on something like this myself that I could also contribute when it's farther along. Regards, Erik Peterson
Re: Simplified Chinese radical set in Unihan
Erik Peterson writes: That's what I was thinking, although the lates ABC dictionary, the Comprehensive, no long uses the CASS table but KangXi with extensions to handle simplified characters. The order of the CASS table used in ABC also differs from the most recent editions of the Xinhua Zidian and Xiandai Hanyu Cidian. However, I think once characters are mapped to the 189 CASS radicals, switching around the order would be easy. Ah, I don't have my copy of the Comprehensive ABC here at home with me. CASS re-ordering won't matter as long as one is picked (Xinhua Zidian would probably make sense) and used consistently and it is documented. It is also easy enough to provide mappings between the different orders, as long as all 189 are there. I've been working on a program like you describe. I think it will handle the majority of characters, but in a significant percentage of cases the mapping is not direct and will need to be hand-corrected. Cool. Hand-correction is expected. I can't wait to see the data you generate. It would probably make sense to move this to the unihan@ mailing list though. I'm not sure and I'm not an expert, but from my experience it seems so. It will be easy enough to check. This is great stuff. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com Beware the lollipop of mediocrity: lick it once and you suck forever