Hello, Dan! 1) This my second portion of comments on the renewed Supported.pod. This part is 100% orthogonal to the first part
2) This patch - changes status of KOI8-U on Jungshik's comment (sorry, I have never tested that myself :-( - upgrades GB2312 to the "first class citizen" (why not?) - adds a section on Microsoft naming acrobatics - that patch includes a comment on the Shift_JIS differences between JIS X 0208-1997 Appendix 1 and cp932 - ... - this patch also makes clear that Encode supports the standards for GB2312 and Big5 not Microsoft extensions (have I grasped it right? :-) --- ext/Encode/lib/Encode/Supported.pod.orig Mon Apr 1 03:42:52 2002 +++ ext/Encode/lib/Encode/Supported.pod Thu Apr 4 15:16:10 2002 @@ -308,8 +308,8 @@ =item * -To (en|de) code Encodings marked as C<*>, You need C<Encode::HanExtra> -,available from CPAN. +To (en|de) code Encodings marked as C<(*)>, You need +C<Encode::HanExtra>, available from CPAN. =back @@ -317,33 +317,43 @@ US-ASCII UTF-8 ISO-8859-* KOI8-R Shift_JIS EUC-JP ISO-2022-JP ISO-2022-JP-1 - EUC-KR Big5 + EUC-KR Big5 GB2312 -are registered to IANA as preferred MIME names and may probably be used over the Internet. +are registered to IANA as preferred MIME names and may probably +be used over the Internet. -C<Shift_JIS> is no longer Microsft proprietary since it has been -officialized by JIS X 0208-1997. +C<Shift_JIS> has been officialized by JIS X 0208-1997. +L<Microsoft-related naming mess> gives details. + +C<GB2312> is the IANA name for C<EUC-CN>. +See L<Microsoft-related naming mess> for details. + +C<GB_2312-80> I<raw> encoding is available as C<gb2312-raw> +with Encode. See L<Encode::CN -- Continental China> for details. EUC-CN + KOI8-U (http://www.faqs.org/rfcs/rfc2319.html) -has not been registered with IANA (as of march 2002) but -seems to be supported by major web browsers. In Encode, GB2312 -is aliased to EUC-CN, with "uncooked" version of GB2312 canonicalized -as gb2312-raw. See L<Encode::CN> for details. +have not been registered with IANA (as of March 2002) but +seem to be supported by major web browsers. +IANA name for C<EUC-CN> is C<GB2312>. KS_C_5601-1987 -has been registered to IANA but when they are used, they are -EUC-coded. Internet community in Korea is not happy with this. -so C<KS_C_5601-1987> is aliased to C<cp949>, an enhanced version -of C<euc-kr>, with ksc5601-raw for "uncooked". +is heavily misused. +See L<Microsoft-related naming mess> for details. + +C<KS_C_5601-1987> I<raw> encoding is available as C<kcs5601-raw> +with Encode. See L<Encode::KR -- Korea> for details. UTF-16 - KOI8-U (http://www.faqs.org/rfcs/rfc2319.html) -are IANA-registered (C<UTF-16> even as a preferred MIME name) +=for comment +waiting for comments from Jungshik Shin to soften this - Anton + +is a IANA-registered preferred MIME name but probably should be avoided as encoding for web pages due to -the lack of browser supports. +the lack of browser support. ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html) GBK @@ -360,6 +370,73 @@ BIG5PLUS (*) is a bit proprietary name. + +=head2 Microsoft-related naming mess + +Microsoft products misuse the following names: + +=over 2 + +=item KS_C_5601-1987 + +Microsoft extension to C<EUC-KR>. + +Proper name: C<CP949>. + +See +http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html +for details. + +Encode aliases C<KS_C_5601-1987> to C<cp949> to reflect +this common misusage. +I<Raw> C<KS_C_5601-1987> encoding is available as C<kcs5601-raw>. + +See L<Encode::KR -- Korea> for details. + +=item GB2312 + +Microsoft extension to C<EUC-CN>. + +Proper names: C<CP936>, C<GBK>. + +C<GB2312> has been registered in the C<EUC-CN> meaning at +IANA. This has partially repaired the situation: Microsoft's +C<GB2312> has become a superset of the official C<GB2312>. + +Encode aliases C<GB2312> to C<euc-cn> in full agreement with +IANA registration. C<cp936> is supported separately. +I<Raw> C<GB_2312-80> encoding is available as C<kcs5601-raw>. + +See L<Encode::CN -- Continental China> for details. + +=item Big5 + +Microsoft extension to C<Big5>. + +Proper name: C<CP950>. + +Encode separately supports C<Big5> and C<cp950>. + +=item Shift_JIS + +Microsoft's understanding of C<Shift_JIS>. + +JIS has not endorsed the full Microsoft standard however. +The official C<Shift_JIS> includes only JIS X 0201 and JIS X 0208 +subsets, while Microsoft has always been meaning C<Shift_JIS> to +encode a wider character repertoire. + +As a historical predecessor Microsoft's variant +probably has more rights for the name, albeit it may be objected +that Microsoft shouldn't have used JIS as part of the name +in the first place. + +Unabiguous name: C<CP932>. + +Encode separately supports C<Shift_JIS> and C<cp932>. + +=back + =head1 Bookmarks What do you think of it, Dan? :-) 3) Jungshik, I would have certainly advocated linking not only to http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html but also to your comments on the KS_C_5601-1987 in the list archive, but all your mails were on several subjects each. Jungshik> ... refer to Ken Lunde's CJKV Information Processing Jungshik> about that 'epic war' between two camps. (see p.197 of Jungshik> the book and http://jshin.net/faq/qa8.html) Jungshik> We even set up a web page to prevent M$ from spreading that Jungshik> ill-defined name. maybe we may link to this page? What is the address? 4) Certainly the [ID 20020312.006] pod2html does not translate space to '_' in L<>-s bug still spoils our links. I have sent a new mail on that to perl5-porters.. Furthermore, I don't understand why C<gb2312-raw> converts to <CODE>gb2312-raw> while C<GB2312> becomes a link? Anyway I have gone for putting C<> around, but if that feature/bug persists maybe it's better to drop the C<> in my patch. - Anton