Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-27 Thread SADAHIRO Tomoyuki

On Thu, 27 Mar 2003 10:02:28 +0900
Dan Kogai [EMAIL PROTECTED] wrote:

 SADAHIRO-san and cp9?? experts,
 
 On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote:
  +U20AC \x80 |0 # EURO SIGN
 
 Is this right?  Yes, U20AC is indeed missing from cp936.ucm but see 
 this;
(snip)

 So far as I check the Microsoft's pages
 
 http://www.microsoft.com/typography/unicode/cscp.htm -
 http://www.microsoft.com/globaldev/reference/wincp.mspx -
 http://www.microsoft.com/globaldev/reference/dbcs/936.htm
 
 it indeed does use \x80 (though only \x00-\xFF are covered;  Where the 
 heck is the FULL MAP!?).  But it seem this only applies to 936.  932 
 (Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950 
 (Traditional Chinese; Big5-based) all leave \x80 blank.
 
 I would like more confirmation from experts;  cp936.ucm has been 
 overhauled with a help of MORIYAMA san and back then and at that time 
 FULL map was available from the URIs above.  And I think \x80 was not 
 used for EURO SIGN back then.

I'm not any expert, but at least, I can tell you
that you can get the official full maps
by clicking a gray box (like [81], [81], ..., [FE]) 
in http://www.microsoft.com/globaldev/reference/dbcs/936.htm

or http://www.microsoft.com/globaldev/reference/dbcs/936/936_81.htm
   http://www.microsoft.com/globaldev/reference/dbcs/936/936_82.htm
etc.

This table does not include any UDC mappings
as well as the table provided on unicode.org.
I don't know why Microsoft has ceased to provides UDC mapping.

http://http.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

 Oh, I still have a copy of full mapping that was one available via URI 
 above.  Let's see...
 
 cp936.txt says...
  CODEPAGE 936; PRC GBK (XGB) - ANSI, OEM
 
  CPINFO 2 0x3f 0x003f; DBCS CP, Default Char = Question Mark
 
  MBTABLE 130
 
  0x000x  ;Null
  [snip]
  0x200x0020  ;Space
  [snip]
  0x7f0x007f  ;^?
  0x800x0080  ;80
  0xff0xf8f5  ;FF
 
 \x80 is mentioned but not mapped to EURO SIGN.
 
 Please somebody tell me where to find the FULL map.
 
 Dan the Encode Maintainer with Too Many (Dead) Links to Follow


IBM's ICU provides another table, which includes UDC mappings
and Unicode-to-CodePage fallbacks (i.e. denoted by |1).

http://oss.software.ibm.com/cvs/icu/charset/data/ucm/windows-936-2000.ucm

EURO SIGN is assigned between Unicode version 2.0 and 2.1.
cf. Unicode 2.1, UTR #8, http://www.unicode.org/reports/tr8/

Your table should be an older one than Unicode 2.0.

SADAHIRO Tomoyuki



Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-27 Thread SADAHIRO Tomoyuki


Sorry. I also dislike dead links and a typo.

(snip)
 This table does not include any UDC mappings
 as well as the table provided on unicode.org.
 I don't know why Microsoft has ceased to provides UDC mapping.
 
 http://http.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

Regards,
SADAHIRO Tomoyuki



[Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-26 Thread SADAHIRO Tomoyuki

diff -urN ucm~/cp936.ucm ucm/cp936.ucm
--- ucm~/cp936.ucm  Mon Mar 10 05:07:44 2003
+++ ucm/cp936.ucm   Wed Mar 26 23:54:26 2003
@@ -137,7 +137,6 @@
 U007D \x7D |0 # RIGHT CURLY BRACKET
 U007E \x7E |0 # TILDE
 U007F \x7F |0 # DELETE
-U0080 \x80 |0 #
 U00A4 \xA1\xE8 |0 # CURRENCY SIGN
 U00A7 \xA1\xEC |0 # SECTION SIGN
 U00A8 \xA1\xA7 |0 # DIAERESIS
@@ -311,6 +310,7 @@
 U2033 \xA1\xE5 |0 # DOUBLE PRIME
 U2035 \xA8\x46 |0 # REVERSED PRIME
 U203B \xA1\xF9 |0 # REFERENCE MARK
+U20AC \x80 |0 # EURO SIGN
 U2103 \xA1\xE6 |0 # DEGREE CELSIUS
 U2105 \xA8\x47 |0 # CARE OF
 U2109 \xA8\x48 |0 # DEGREE FAHRENHEIT
End of patch



sigh, I've made such a patch long before.

cf.
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-09/msg01568.html

Regards,
SADAHIRO Tomoyuki



Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-26 Thread Dan Kogai
SADAHIRO-san and cp9?? experts,

On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote:
+U20AC \x80 |0 # EURO SIGN
Is this right?  Yes, U20AC is indeed missing from cp936.ucm but see 
this;

grep U20AC ucm/cp*.ucm
/Users/dankogai/work/Encode/ucm/cp1250.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1251.ucm:U20AC \x88 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1252.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1253.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1254.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1255.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1256.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1257.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp1258.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp874.ucm:U20AC \x80 |0 # EURO SIGN
/Users/dankogai/work/Encode/ucm/cp949.ucm:U20AC \xA2\xE6 |0 # EURO 
SIGN
/Users/dankogai/work/Encode/ucm/cp950.ucm:U20AC \xA3\xE1 |0 # EURO 
SIGN
\x80 SEEMS right for single-byte CPs but they are mapped differently in 
CP949 and CP950.
So far as I check the Microsoft's pages

http://www.microsoft.com/typography/unicode/cscp.htm -
http://www.microsoft.com/globaldev/reference/wincp.mspx -
http://www.microsoft.com/globaldev/reference/dbcs/936.htm
it indeed does use \x80 (though only \x00-\xFF are covered;  Where the 
heck is the FULL MAP!?).  But it seem this only applies to 936.  932 
(Japanese; Shift_JIS based), 949 (Korean; euc-kr based) and 950 
(Traditional Chinese; Big5-based) all leave \x80 blank.

I would like more confirmation from experts;  cp936.ucm has been 
overhauled with a help of MORIYAMA san and back then and at that time 
FULL map was available from the URIs above.  And I think \x80 was not 
used for EURO SIGN back then.

Oh, I still have a copy of full mapping that was one available via URI 
above.  Let's see...

cp936.txt says...
CODEPAGE 936; PRC GBK (XGB) - ANSI, OEM

CPINFO 2 0x3f 0x003f; DBCS CP, Default Char = Question Mark

MBTABLE 130

0x000x  ;Null
[snip]
0x200x0020  ;Space
[snip]
0x7f0x007f  ;^?
0x800x0080  ;80
0xff0xf8f5  ;FF
\x80 is mentioned but not mapped to EURO SIGN.

Please somebody tell me where to find the FULL map.

Dan the Encode Maintainer with Too Many (Dead) Links to Follow