Re: Status of Unihan Mandarin readings?
On the errors in kMandarin: Apart from the kMandarin errors of the kind that Andrew West has noted, there is another corruption, namely the loss of ü, and this happened between 3.0b1 and 3.0b2, when the ü became the two bytes C393. As to Han/Yi, U+6C49 YI4 HAN4 is found not only in 3.0b1 and 3.0b2, but also in 2.0. The HAN4 was dropped only in 3.2. While I admire the effort to explain the intrusion of YI4, I feel it is a bit misplaced, and that some more mechanical/clerical explanation is in order. After all, look at the number of times same as U+ is written as sama as U+... in 3.2: 6 to be precise. Raymond Mercier Raymond Mercier
Re: Status of Unihan Mandarin readings?
At 08:44 AM 12/20/2002 -0700, you wrote: That's because the file was converted to UTF-8. Previously it had not been in any single encoding, which was creating problems Well, OK, but should you have created by now some sort of program that checks the file whenever you make a change - a sort of spellcheck ? Should not be too hard to write something that displays the effects of any changes. Raymond Mercier
Re: Status of Unihan Mandarin readings?
On Thu, 19 Dec 2002 04:58:08 -0800 (PST), Marco Cimarosti wrote: I have tried to follow the discussion about the errors in field kMandarin of file Unihan.txt but, after a while, I lost my way with all those dictionary references... Could someone kindly make a short summary of the situation? Here are my biggest ???'s: Here's my take on the situation : - Are the errors really there? Yes. - Any estimate as to how many entries are affected? I estimate about 10% of basic CJK, in other words 2,000+ - Is it only kMandarin affected or also any other fields? I don't think any other fields are affected. - Any estimates for when it will be possible publish a fixed version? I'll let Mr. Jenkins answer that one. - Any suggestion for interim work-arounds (e.g., an older version of the file, an alternative source)? Use the Unihan database for Unicode 3.0 at http://www.unicode.org/Public/3.0-Update/Unihan-3.txt This is the latest uncorrupted version. Hope this clarifies the situation. Andrew