Porters (especially Nick Ing-XS),
I would like to release Encode 1.78 soon to address the problem in
CP932 (MS version of Shift_JIS) which MORIYAMA Masayuki
<[EMAIL PROTECTED]> has discovered. Not only has he addressed the
problem he has also supplied me a patch. Though he was reluctant to
come to perl(5-porters|unicode)@perl.org (I have invited him but I was
too shy to talk to us in English), the problem and solution he has
raised was too good to ignore so I would like to update Encode on his
behalf. Here is the summery of his points.
* ucm/cp932.ucm was based on the mapping file at unicode.org [0] but
that mapping is obsolete; it works on Windows 3.1 but not in the era
of Win32.
* as a result, cp932 is rendered almost useless, at least too
impractical
* patch was made available [1]
My first suggestion was to "Ask MS to update the data at unicode.org
and if you are unsatisfied w/ the one that comes w/ Encode you are free
to CPANize your version". But he has raised even more points and I was
finally convinced.
* Though not in unicode.org, MS has already made the mapping available
in their web [2][3]
* Python and Ruby will be using the MS version, not the one at
unicode.org
* Java has been known to suffer badly for confusing Shift_JIS and CP932
but Encode is already free of this problem by supplying different
mappings for Shift_JIS and CP932.
[0] http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/
CP932.TXT
[1] http://www2d.biglobe.ne.jp/~msyk/perl/cp932.html
[2] http://www.microsoft.com/typography/unicode/cscp.htm
[3] http://www.microsoft.com/typography/unicode/932.txt
One small but significant concern is Tcl/Tk; So far Encode's CP932
does match that of Tcl but not after my next release of Encode. So I
decided to call for opinion before I commit the release.
AFAIK, CP¥d+ should be avoided for any data exchanged in the Net so you
should not use it on the web or mails so it's perfectly all right if
Tk(Web|Mail) has a problem handling them. At the same time Win32 Perl
users would be much happier if CP¥d+ are made more practical.
The URI [2] also has links to other code pages so I would also like to
review them and if neccessary, update them. 8 bit code pages (CP12??)
seem OK but other CJK (CP9??) needs reviews.
Dan the Encode Maintainer
- ucm/cp???.ucm will be updated Dan Kogai
- ucm/cp???.ucm will be updated Dan Kogai
- Re: ucm/cp???.ucm will be updated Dan Kogai
- Re: [Encode] HEADS-UP: ucm/cp932.ucm will be updated Nick Ing-Simmons
- [Encode] 1.78 released Dan Kogai