Re: Simplified Chinese radical set in Unihan

2004-12-19 Thread Richard Cook
On Dec 16, 2004, at 3:20 PM, Tom Emerson wrote:
Ah, I don't have my copy of the Comprehensive ABC here at home with me.
If you have Wenlin, you have it in electronic form. Wenlin does the 
typesetting (and sub-licensing) for ABC, and the ABC data is accessible 
from within the Wenlin app.

But on the subject of a Simplified Chinese radical set for Unihan:
Please see the new field kHDZRadBreak coming in the Unihan 4.1 beta. 
This field shows a way to add additional radical info to Unihan. That 
is, for a lexical kSource in Unihan, one can associate kSource mappings 
with radical transitions. The Hanyu Da Zidian radical set is in fact a 
simplification of Kang Xi, though not one using simplified characters. 
When lexical mappings for a good simplified PRC lexicon are included in 
Unihan, a similar table can be built. We've got mapping and pinyin data 
for all of Xiandai Hanyu Cidian, accepted by UTC for inclusion in 
future Unihan. This will hopefully be added to Unihan in the coming 
year (pending final proofing).




Re: Simplified Chinese radical set in Unihan

2004-12-16 Thread Tom Emerson
John H. Jenkins writes:
 As you say, the main problem is that there are so many different 
 possible sets. Some will be proprietary, which would limit their 
[...]

IMHO the CASS radical set would be the one to use for this. It has the
support of a major national body and is used in several dictionaries,
including Xiandai Hanyu Cidian and John DeFrancis' ABC Dictionary.

I expect too that generating a CASS radical index could be done
automatically using the kRSKangXi field in Unihan and the mapping
table between CASS and KangXi radical numbers in DeFrancis - kRSCASS
would make sense.

Are all of the CASS radicals themselves encoded in Unicode?

-tree

-- 
Tom Emerson  Basis Technology Corp.
Software Architect http://www.basistech.com
  Beware the lollipop of mediocrity: lick it once and you suck forever



Re: Simplified Chinese radical set in Unihan

2004-12-16 Thread Erik Peterson

IMHO the CASS radical set would be the one to use for this. It has the
support of a major national body and is used in several dictionaries,
including Xiandai Hanyu Cidian and John DeFrancis' ABC Dictionary.
  That's what I was thinking, although the lates ABC dictionary, the 
Comprehensive, no long uses the CASS table but KangXi with extensions to 
handle simplified characters.  The order of the CASS table used in ABC 
also differs from the most recent editions of the Xinhua Zidian and 
Xiandai Hanyu Cidian.  However, I think once characters are mapped to 
the 189 CASS radicals, switching around the order would be easy.

I expect too that generating a CASS radical index could be done
automatically using the kRSKangXi field in Unihan and the mapping
table between CASS and KangXi radical numbers in DeFrancis - kRSCASS
would make sense.
  I've been working on a program like you describe.  I think it will 
handle the majority of characters, but in a significant percentage of 
cases the mapping is not direct and will need to be hand-corrected.

Are all of the CASS radicals themselves encoded in Unicode?
  I'm not sure and I'm not an expert, but from my experience it seems so.
Erik


Re: Simplified Chinese radical set in Unihan

2004-12-16 Thread John H. Jenkins
As you say, the main problem is that there are so many different 
possible sets. Some will be proprietary, which would limit their 
usefulness although there would, I believe, otherwise be no objection 
to its inclusion. If you can come up with a reasonably standard set and 
reasonably consistent data across several dictionaries referencing it, 
I'm sure there'd be no objection to including it.

On Dec 16, 2004, at 2:19 PM, Erik Peterson wrote:

Hello,
 I've found many uses for the UniHan data file the past few years. 
It's a great source of information.

 One potential addition that I've wanted is a field listing the 
simplified Chinese radical for at least the simplified Chinese 
characters, like what exists for the Xinhua Zidian (Xinhua 
Dictionary) and other mainland Chinese dictionaries. I was wondering 
if this has been discussed before?

 Some potential difficulties I could see include the fact that 
mainland dictionaries use a variety of different radical schemes. The 
most standard one that I can find is the Chinese Academy of Social 
Sciences (CASS) set with 189 different radicals. Even for dictionaries 
that use this set the ordering is often different. Could the radical 
set also be proprietary in some way?

 Anyway, I was curious. I've been working on something like this 
myself that I could also contribute when it's farther along.

Regards,
Erik Peterson




Re: Simplified Chinese radical set in Unihan

2004-12-16 Thread Tom Emerson
Erik Peterson writes:
That's what I was thinking, although the lates ABC dictionary, the 
 Comprehensive, no long uses the CASS table but KangXi with extensions to 
 handle simplified characters.  The order of the CASS table used in ABC 
 also differs from the most recent editions of the Xinhua Zidian and 
 Xiandai Hanyu Cidian.  However, I think once characters are mapped to 
 the 189 CASS radicals, switching around the order would be easy.

Ah, I don't have my copy of the Comprehensive ABC here at home with me.

CASS re-ordering won't matter as long as one is picked (Xinhua Zidian
would probably make sense) and used consistently and it is
documented. It is also easy enough to provide mappings between the
different orders, as long as all 189 are there.

I've been working on a program like you describe.  I think it will 
 handle the majority of characters, but in a significant percentage of 
 cases the mapping is not direct and will need to be hand-corrected.

Cool. Hand-correction is expected. I can't wait to see the data you
generate. It would probably make sense to move this to the unihan@
mailing list though.

I'm not sure and I'm not an expert, but from my experience it seems so.

It will be easy enough to check.

This is great stuff.

-tree

-- 
Tom Emerson  Basis Technology Corp.
Software Architect http://www.basistech.com
  Beware the lollipop of mediocrity: lick it once and you suck forever