Re: [Patch] Encode.pm : euro sign missing in cp936.ucm
On Thu, 27 Mar 2003 10:02:28 +0900 Dan Kogai [EMAIL PROTECTED] wrote: SADAHIRO-san and cp9?? experts, On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote: +U20AC \x80 |0 # EURO SIGN Is this right? Yes, U20AC is indeed missing from cp936.ucm but see this; (snip) So far as I check the Microsoft's pages http://www.microsoft.com/typography/unicode/cscp.htm - http://www.microsoft.com/globaldev/reference/wincp.mspx - http://www.microsoft.com/globaldev/reference/dbcs/936.htm it indeed does use \x80 (though only \x00-\xFF are covered; Where the heck is the FULL MAP!?). But it seem this only applies to 936. 932 (Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950 (Traditional Chinese; Big5-based) all leave \x80 blank. I would like more confirmation from experts; cp936.ucm has been overhauled with a help of MORIYAMA san and back then and at that time FULL map was available from the URIs above. And I think \x80 was not used for EURO SIGN back then. I'm not any expert, but at least, I can tell you that you can get the official full maps by clicking a gray box (like [81], [81], ..., [FE]) in http://www.microsoft.com/globaldev/reference/dbcs/936.htm or http://www.microsoft.com/globaldev/reference/dbcs/936/936_81.htm http://www.microsoft.com/globaldev/reference/dbcs/936/936_82.htm etc. This table does not include any UDC mappings as well as the table provided on unicode.org. I don't know why Microsoft has ceased to provides UDC mapping. http://http.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT Oh, I still have a copy of full mapping that was one available via URI above. Let's see... cp936.txt says... CODEPAGE 936; PRC GBK (XGB) - ANSI, OEM CPINFO 2 0x3f 0x003f; DBCS CP, Default Char = Question Mark MBTABLE 130 0x000x ;Null [snip] 0x200x0020 ;Space [snip] 0x7f0x007f ;^? 0x800x0080 ;80 0xff0xf8f5 ;FF \x80 is mentioned but not mapped to EURO SIGN. Please somebody tell me where to find the FULL map. Dan the Encode Maintainer with Too Many (Dead) Links to Follow IBM's ICU provides another table, which includes UDC mappings and Unicode-to-CodePage fallbacks (i.e. denoted by |1). http://oss.software.ibm.com/cvs/icu/charset/data/ucm/windows-936-2000.ucm EURO SIGN is assigned between Unicode version 2.0 and 2.1. cf. Unicode 2.1, UTR #8, http://www.unicode.org/reports/tr8/ Your table should be an older one than Unicode 2.0. SADAHIRO Tomoyuki
Re: [Patch] Encode.pm : euro sign missing in cp936.ucm
Sorry. I also dislike dead links and a typo. (snip) This table does not include any UDC mappings as well as the table provided on unicode.org. I don't know why Microsoft has ceased to provides UDC mapping. http://http.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT Regards, SADAHIRO Tomoyuki
[Patch] Encode.pm : euro sign missing in cp936.ucm
diff -urN ucm~/cp936.ucm ucm/cp936.ucm --- ucm~/cp936.ucm Mon Mar 10 05:07:44 2003 +++ ucm/cp936.ucm Wed Mar 26 23:54:26 2003 @@ -137,7 +137,6 @@ U007D \x7D |0 # RIGHT CURLY BRACKET U007E \x7E |0 # TILDE U007F \x7F |0 # DELETE -U0080 \x80 |0 # U00A4 \xA1\xE8 |0 # CURRENCY SIGN U00A7 \xA1\xEC |0 # SECTION SIGN U00A8 \xA1\xA7 |0 # DIAERESIS @@ -311,6 +310,7 @@ U2033 \xA1\xE5 |0 # DOUBLE PRIME U2035 \xA8\x46 |0 # REVERSED PRIME U203B \xA1\xF9 |0 # REFERENCE MARK +U20AC \x80 |0 # EURO SIGN U2103 \xA1\xE6 |0 # DEGREE CELSIUS U2105 \xA8\x47 |0 # CARE OF U2109 \xA8\x48 |0 # DEGREE FAHRENHEIT End of patch sigh, I've made such a patch long before. cf. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-09/msg01568.html Regards, SADAHIRO Tomoyuki
Re: [Patch] Encode.pm : euro sign missing in cp936.ucm
SADAHIRO-san and cp9?? experts, On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote: +U20AC \x80 |0 # EURO SIGN Is this right? Yes, U20AC is indeed missing from cp936.ucm but see this; grep U20AC ucm/cp*.ucm /Users/dankogai/work/Encode/ucm/cp1250.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1251.ucm:U20AC \x88 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1252.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1253.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1254.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1255.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1256.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1257.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp1258.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp874.ucm:U20AC \x80 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp949.ucm:U20AC \xA2\xE6 |0 # EURO SIGN /Users/dankogai/work/Encode/ucm/cp950.ucm:U20AC \xA3\xE1 |0 # EURO SIGN \x80 SEEMS right for single-byte CPs but they are mapped differently in CP949 and CP950. So far as I check the Microsoft's pages http://www.microsoft.com/typography/unicode/cscp.htm - http://www.microsoft.com/globaldev/reference/wincp.mspx - http://www.microsoft.com/globaldev/reference/dbcs/936.htm it indeed does use \x80 (though only \x00-\xFF are covered; Where the heck is the FULL MAP!?). But it seem this only applies to 936. 932 (Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950 (Traditional Chinese; Big5-based) all leave \x80 blank. I would like more confirmation from experts; cp936.ucm has been overhauled with a help of MORIYAMA san and back then and at that time FULL map was available from the URIs above. And I think \x80 was not used for EURO SIGN back then. Oh, I still have a copy of full mapping that was one available via URI above. Let's see... cp936.txt says... CODEPAGE 936; PRC GBK (XGB) - ANSI, OEM CPINFO 2 0x3f 0x003f; DBCS CP, Default Char = Question Mark MBTABLE 130 0x000x ;Null [snip] 0x200x0020 ;Space [snip] 0x7f0x007f ;^? 0x800x0080 ;80 0xff0xf8f5 ;FF \x80 is mentioned but not mapped to EURO SIGN. Please somebody tell me where to find the FULL map. Dan the Encode Maintainer with Too Many (Dead) Links to Follow