Re: CJK stroke order data: kRSUnicode v. kRSKangXi

2014-03-12 Thread Richard COOK
On Mar 12, 2014, at 2:59 AM, Adam Nohejl wrote:
>> 
>> Since kRSUnicode is a Normative property, a formal proposal to modify that 
>> data is required, for review in WG2. I have added notes on the items you 
>> mention below, for consideration in that process, and in the meantime, if 
>> you identify any other issues, please bring them to our attention.
> 
> OK, I will prepare a more comprehensive list. Do you mean that you would 
> submit such a formal proposal? Or can I submit it myself somehow?

You are welcome to prepare a proposal, or just send us your list.

We have already started a proposal to augment kRSUnicode, but I'm not sure 
about the timeframe for completion.

The proofing of the various Kangxi properties is separate from this, but is 
aimed at the specific KX edition used by IRG in Extension B work.

-Richard


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: CJK stroke order data: kRSUnicode v. kRSKangXi

2014-03-12 Thread Adam Nohejl
Mr. Cook,

Thank you for all the information (and bringin Wenlin to my attention as well).

> Where a specific value attested in a specific Kangxi edition is missing from 
> kRSUnicode, it would indeed be useful to add it,

Great to hear that.

> Since kRSUnicode is a Normative property, a formal proposal to modify that 
> data is required, for review in WG2. I have added notes on the items you 
> mention below, for consideration in that process, and in the meantime, if you 
> identify any other issues, please bring them to our attention.

OK, I will prepare a more comprehensive list. Do you mean that you would submit 
such a formal proposal? Or can I submit it myself somehow?

> PS: About the subject line of your message.

Yes, of course, the subject should have read "CJK radical-stroke count data". 
Not one of my brightest moments, I guess...


-- 
Adam Nohejl
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: CJK stroke order data: kRSUnicode v. kRSKangXi

2014-03-10 Thread Richard COOK
Mr. Nohejl,

About the property data you mention below. kRSUnicode property data permits 
multiple/variant (space-delimited) radical/stroke values, and I think we will 
see important variants added in the future. Where a specific value attested in 
a specific Kangxi edition is missing from kRSUnicode, it would indeed be useful 
to add it, and perhaps to give it priority (move it to the front of the list). 
Likewise, if a common variant value is missing (even one not associated with 
Kangxi), it might be added for convenience. And if there are any outright 
errors, of course those should be identified and corrected (but clear errors 
are harder to find these days). 

Note that because kRSUnicode covers *all* Unihan CJK, even those characters not 
present in the original Kangxi, some of the radical/stroke values are so-called 
"virtual" assignments (those should be omitted from consideration, in proofing 
original KX data).

Several years ago we (at Wenlin.com) produced consolidated Kangxi data for our 
Zidian (Wenlin 4.X), taking these four properties (among other data) as input:



 


The last of these may not have any obvious connection with Kangxi, until one 
reads the kIRG_GSource property description and sees this "sub-property" 
description:

"GKX Kangxi Dictionary ideographs (康熙字典) 9th edition (1958) including the 
addendum (康熙字典)補遺"

PRC researchers have done much work proofing G-Source Kangxi data, to address 
many aspects of the complex original text. 

The Kangxi work we did at Wenlin has several dimensions, and some of this has 
not yet rippled back into UCD.

We have in fact already identified many important omissions from kRSUnicode, 
which we plan to propose for a future data release. 

Since kRSUnicode is a Normative property, a formal proposal to modify that data 
is required, for review in WG2. I have added notes on the items you mention 
below, for consideration in that process, and in the meantime, if you identify 
any other issues, please bring them to our attention.

-Richard

PS: About the subject line of your message. Please note that despite the "CJK 
stroke order" subject line in your message, we are not talking about CJK stroke 
order here at all, but about Kangxi and UCS radical assignment, and residual 
stroke *count* data. Such data can indeed be used to "order" (collate) CJK 
data, but "stroke order" is a separate issue, involving the particular sequence 
of CJK Strokes (see The Unicode Standard, Appendix F) in the writing of a given 
character (stroke-order data can also be used for collation and indexing). 
Wenlin's CDL database (which inspired the CJK Stroke block, and also produced 
Appendix F) contains a comprehensive analysis of CJK Stroke order *and* 
Radical/Stroke data for all UCS CJK, primarily focused on PRC norms, but also 
including a great many variants (variants forms, variant stroke counts, and 
variant radical assignments).


On Feb 28, 2014, at 10:56 AM, Adam Nohejl wrote:

> 
> (1) A very common character for "most, maximum".
> 最[U+6700] kRSKangXi   73.8
> 最[U+6700] kRSUnicode  13.10
> 
> (2) A funny character for autumn containing the turtle component.
> 龝[U+9F9D] kRSKangXi   115.16
> 龝[U+9F9D] kRSKanWa115.16
> 龝[U+9F9D] kRSUnicode  213.5
> 
> There are also characters that actually are not included in the Kang Xi 
> dictionary**, but the Unihan data contain both a purported Kang Xi radical 
> and in addition to that a _different_ Unicode radical.
> 
> (3) The simplified turtle character (commonly assigned to the traditional 
> radical #213):
> 亀[U+4E80] kRSKangXi   213.0
> 亀[U+4E80] kRSUnicode  5.10
> 
> (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary 
> decision, but unexpectedly the fields differ:
> 曻[U+66FB] kRSKangXi   72.7
> 曻[U+66FB] kRSUnicode  73.7


> Hello,
> 
> I am comparing radical data for CJK characters from different sources, 
> including the Unihan database. According to the Unihan documentation* the 
> kRSUnicode radical should correspond to kRSKangXi radical, which in turn 
> should be based on the Kang Xi dictionary.
> 
> Is there any explanation for the following discrepancies? Did I miss any 
> other rules or reasoning behind the content of these two fields?
> 
> Examples of the discrepancies:
> 
> (1) A very common character for "most, maximum".
> U+6700kRSKangXi   73.8
> U+6700kRSUnicode  13.10
> 
> (2) A funny character for autumn containing the turtle component.
> U+9F9DkRSKangXi   115.16
> U+9F9DkRSKanWa115.16
> U+9F9DkRSUnicode  213.5
> 
> There are also characters that actually are not included in the Kang Xi 
> dictionary**, but the Unihan data contain both a purported Kang

Re: CJK stroke order data: kRSUnicode v. kRSKangXi

2014-03-09 Thread Leonardo Boiko
I don't know about the points you raise, but I wish it was easier to help
proofread Unihan data.  Back in 2012 I compared kKangXi to kIRGKangXI and
found 252 conflicts, besides the cases where a character only has one or
the other.  I even put together a simple tool to help fixing this, with
links to the relevant pages at the online Kang Xi[1].  I had no replies…

[1] http://namakajiri.net/misc/unihan_kangxi/compare_existing.html for
characters in Kang Xi, and for the others,
http://namakajiri.net/misc/unihan_kangxi/compare_nonexisting.html


2014-03-09 9:39 GMT-03:00 Adam Nohejl :

> Hello again,
>
> I would be really grateful for any reply or at least pointers to relevant
> information about this topic (stroke-order data in Unihan, see my previous
> message below).
>
> Or is there any other appropriate place to discuss this?
>
> Thank you,
>
> --
> Adam
>
> On 2014/02/28, at 19:56, Adam Nohejl  wrote:
> >
> > Hello,
> >
> > I am comparing radical data for CJK characters from different sources,
> including the Unihan database. According to the Unihan documentation* the
> kRSUnicode radical should correspond to kRSKangXi radical, which in turn
> should be based on the Kang Xi dictionary.
> >
> > Is there any explanation for the following discrepancies? Did I miss any
> other rules or reasoning behind the content of these two fields?
> >
> > Examples of the discrepancies:
> >
> > (1) A very common character for "most, maximum".
> > U+6700kRSKangXi   73.8
> > U+6700kRSUnicode  13.10
> >
> > (2) A funny character for autumn containing the turtle component.
> > U+9F9DkRSKangXi   115.16
> > U+9F9DkRSKanWa115.16
> > U+9F9DkRSUnicode  213.5
> >
> > There are also characters that actually are not included in the Kang Xi
> dictionary**, but the Unihan data contain both a purported Kang Xi radical
> and in addition to that a _different_ Unicode radical.
> >
> > (3) The simplified turtle character (commonly assigned to the
> traditional radical #213):
> > U+4E80kRSKangXi   213.0
> > U+4E80kRSUnicode  5.10
> >
> > (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary
> decision, but unexpectedly the fields differ:
> > U+66FBkRSKangXi   72.7
> > U+66FBkRSUnicode  73.7
> >
> > - - -
> >
> > [*] : "Property:
> kRSUnicode // Description: (...) The first value is intended to reflect the
> same radical as the kRSKangXi field and the stroke count of the glyph used
> to print the character within the Unicode Standard."
> >
> > [**] The two characters are missing from the '89 edition of Kang Xi
> (which should be the same as used for Unihan) according to search on this
> site: 
>
>
>
> ___
> Unicode mailing list
> Unicode@unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: CJK stroke order data: kRSUnicode v. kRSKangXi

2014-03-09 Thread Adam Nohejl
Hello again,

I would be really grateful for any reply or at least pointers to relevant 
information about this topic (stroke-order data in Unihan, see my previous 
message below).

Or is there any other appropriate place to discuss this?

Thank you,

-- 
Adam

On 2014/02/28, at 19:56, Adam Nohejl  wrote:
> 
> Hello,
> 
> I am comparing radical data for CJK characters from different sources, 
> including the Unihan database. According to the Unihan documentation* the 
> kRSUnicode radical should correspond to kRSKangXi radical, which in turn 
> should be based on the Kang Xi dictionary.
> 
> Is there any explanation for the following discrepancies? Did I miss any 
> other rules or reasoning behind the content of these two fields?
> 
> Examples of the discrepancies:
> 
> (1) A very common character for "most, maximum".
> U+6700kRSKangXi   73.8
> U+6700kRSUnicode  13.10
> 
> (2) A funny character for autumn containing the turtle component.
> U+9F9DkRSKangXi   115.16
> U+9F9DkRSKanWa115.16
> U+9F9DkRSUnicode  213.5
> 
> There are also characters that actually are not included in the Kang Xi 
> dictionary**, but the Unihan data contain both a purported Kang Xi radical 
> and in addition to that a _different_ Unicode radical.
> 
> (3) The simplified turtle character (commonly assigned to the traditional 
> radical #213):
> U+4E80kRSKangXi   213.0
> U+4E80kRSUnicode  5.10
> 
> (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary 
> decision, but unexpectedly the fields differ:
> U+66FBkRSKangXi   72.7
> U+66FBkRSUnicode  73.7
> 
> - - -
> 
> [*] : "Property: kRSUnicode 
> // Description: (...) The first value is intended to reflect the same radical 
> as the kRSKangXi field and the stroke count of the glyph used to print the 
> character within the Unicode Standard."
> 
> [**] The two characters are missing from the '89 edition of Kang Xi (which 
> should be the same as used for Unihan) according to search on this site: 
> 



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


CJK stroke order data: kRSUnicode v. kRSKangXi

2014-02-28 Thread Adam Nohejl
Hello,

I am comparing radical data for CJK characters from different sources, 
including the Unihan database. According to the Unihan documentation* the 
kRSUnicode radical should correspond to kRSKangXi radical, which in turn should 
be based on the Kang Xi dictionary.

Is there any explanation for the following discrepancies? Did I miss any other 
rules or reasoning behind the content of these two fields?

Examples of the discrepancies:

(1) A very common character for "most, maximum".
U+6700  kRSKangXi   73.8
U+6700  kRSUnicode  13.10

(2) A funny character for autumn containing the turtle component.
U+9F9D  kRSKangXi   115.16
U+9F9D  kRSKanWa115.16
U+9F9D  kRSUnicode  213.5

There are also characters that actually are not included in the Kang Xi 
dictionary**, but the Unihan data contain both a purported Kang Xi radical and 
in addition to that a _different_ Unicode radical.

(3) The simplified turtle character (commonly assigned to the traditional 
radical #213):
U+4E80  kRSKangXi   213.0
U+4E80  kRSUnicode  5.10

(4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary 
decision, but unexpectedly the fields differ:
U+66FB  kRSKangXi   72.7
U+66FB  kRSUnicode  73.7

- - -

[*] : "Property: kRSUnicode // 
Description: (...) The first value is intended to reflect the same radical as 
the kRSKangXi field and the stroke count of the glyph used to print the 
character within the Unicode Standard."

[**] The two characters are missing from the '89 edition of Kang Xi (which 
should be the same as used for Unihan) according to search on this site: 



-- 
Adam Nohejl


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode