Autrijus,

   welcome to the club :)

On 2002.02.19, at 09:43, Autrijus Tang wrote:
> Being a native Big5/GB (and HKSCS, Big5+, etc) user, I'm extremely 
> happy to
> see Dan's work on Encode.pm. :-)

   Actually more credit should go to Nick Ing-Simmons;  Encode::XX is so 
far all based upon his work on EUC_JP.

> At jhi's bidding, I did some rudimentary test using the standard Big5 
> encoding
> range, with iconv 2.0 as the reference point. Within the 
> [A1-F9][40-7E,A1-FE]
> range, the result was like this:
>
> 1. In the A140 - A3BF range (punctuations and phonetic symbols), iconv 
> parsed
>    without errors; Encode, however, does not agree with it in 3 places:
>
>    * big5(A150) doesn't get mapped properly.
>    * it has an off-by-one error in range big5(A15A..A17D); it mapped
>      big5(A15A) as ucs2(big5(A15B)), big5(A15B) as ucs2(big5(A15C)), 
> etc.
>    * it cannot parse the range big5(A17E..A3BF).
>
> 2. In the A440 - C67E range (widely-used characters), both iconv and 
> Encode
>    worked perfectly.
>
> 3. In the C6A1 - C8D3 range (word parts, japanese characters, and 
> assorted
>    symbols), both Encode and iconv doesn't work beyond big5(C7FC), which
>    is expected.
>
> 4. In the C940 - F9D5 range (rarely used characters), both iconv and 
> Encode
>    worked perfectly.
>
> 5. In the F9D6 - F9FE range (addendum, table-drawing characters), both 
> of them
>    doesnt work, which is expected.

   I also found Encode::TW "compiling" somewhat noisy.  Would you also 
test it on Encode::Tcl module?  Encode::TW is in a way just compiled 
version of Encode::Tcl.

> 6. I didn't test utf8=>big5 much, but they seem to work alright.
>
> Note that the Big5+ spec at <http://www.cmex.org.tw/download-b5.html> 
> specified
> a rather comprehensive set of official big5<=>ucs2 mappings, the 
> relevant
> part of it are available at <http://autrijus.org/big5-ucs.tar.gz>. 
> Their format
> should be self-descriptory; I wonder if it's possible to use that table 
> to
> fill in the missing codepoints, or should we add a 'big5p' encoding?

   Other major codings that are missing is obviously CNS11643.  I don't 
know much about it but so far as I know CNS11643 is ISO-2022 compliant 
and CNS11643-1 and CNS11643-2 covers Big5.
   But as you see Encode:XX is so far dependent on Tcl encoding and there 
is no CNS11643 there yet....

> Anyway, I'll get some more tests (and get GB working) when I wake up.
>
> Hope that helps,

   It does!  Now we are looking for testers of KR and CN as well.  Anyone?


Dan the Man with Too Many Languages to Learn

Reply via email to