Autrijus, welcome to the club :)
On 2002.02.19, at 09:43, Autrijus Tang wrote: > Being a native Big5/GB (and HKSCS, Big5+, etc) user, I'm extremely > happy to > see Dan's work on Encode.pm. :-) Actually more credit should go to Nick Ing-Simmons; Encode::XX is so far all based upon his work on EUC_JP. > At jhi's bidding, I did some rudimentary test using the standard Big5 > encoding > range, with iconv 2.0 as the reference point. Within the > [A1-F9][40-7E,A1-FE] > range, the result was like this: > > 1. In the A140 - A3BF range (punctuations and phonetic symbols), iconv > parsed > without errors; Encode, however, does not agree with it in 3 places: > > * big5(A150) doesn't get mapped properly. > * it has an off-by-one error in range big5(A15A..A17D); it mapped > big5(A15A) as ucs2(big5(A15B)), big5(A15B) as ucs2(big5(A15C)), > etc. > * it cannot parse the range big5(A17E..A3BF). > > 2. In the A440 - C67E range (widely-used characters), both iconv and > Encode > worked perfectly. > > 3. In the C6A1 - C8D3 range (word parts, japanese characters, and > assorted > symbols), both Encode and iconv doesn't work beyond big5(C7FC), which > is expected. > > 4. In the C940 - F9D5 range (rarely used characters), both iconv and > Encode > worked perfectly. > > 5. In the F9D6 - F9FE range (addendum, table-drawing characters), both > of them > doesnt work, which is expected. I also found Encode::TW "compiling" somewhat noisy. Would you also test it on Encode::Tcl module? Encode::TW is in a way just compiled version of Encode::Tcl. > 6. I didn't test utf8=>big5 much, but they seem to work alright. > > Note that the Big5+ spec at <http://www.cmex.org.tw/download-b5.html> > specified > a rather comprehensive set of official big5<=>ucs2 mappings, the > relevant > part of it are available at <http://autrijus.org/big5-ucs.tar.gz>. > Their format > should be self-descriptory; I wonder if it's possible to use that table > to > fill in the missing codepoints, or should we add a 'big5p' encoding? Other major codings that are missing is obviously CNS11643. I don't know much about it but so far as I know CNS11643 is ISO-2022 compliant and CNS11643-1 and CNS11643-2 covers Big5. But as you see Encode:XX is so far dependent on Tcl encoding and there is no CNS11643 there yet.... > Anyway, I'll get some more tests (and get GB working) when I wake up. > > Hope that helps, It does! Now we are looking for testers of KR and CN as well. Anyone? Dan the Man with Too Many Languages to Learn