On Mon, 19 Dec 2005 22:28:55 -0800 (PST), rajarshi das <[EMAIL PROTECTED]> wrote
> I am testing this with iso-2022-jp encoding : > ------------------------ > use encoding 'iso-2022-jp'; > > $a = "^[$B$!^[(B"; > print "a : $a\n"; > ------------------------ > > On linux, I get : > a : ^[^[(B > /* Why is the '(B' shown? Isnt this just an escape > char to switch over to ASCII ? */ In a double-quote string, $B and $! are interpolated as a variable; that is $a = '^[' . $B . $! . '^[(B'; in other words, a concatenation of literal ^[ and variable $B and variable $! and literal ^[(B And ^[ is CIRCUMFLEX ACCENT + LEFT SQUARE BRACKET but not a control character ESCAPE. > On ebcdic, I get : > Malformed UTF-8 character (unexpected end of string) > at /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl > line 330. > Malformed UTF-8 character (unexpected continuation > byte 0x6a, with no preceding start byte) in pattern > match (m//) at > /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line > 337. > Malformed UTF-8 character (unexpected continuation > byte 0x6a, with no preceding start byte) in pattern > match (m//) at > /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line > 337. > > -- and some junk data. > > Seems like in "$B$!^[(B" above, $! and ^[ are > incorrect two byte sequences on ebcdic. However, $! > donot translate into printable characters on cp-1047 . > What do we replace them by ? Accoding to JIS X 0208:1997 Appendix 2 (that specifies ISO-2022-JP), escape sequences for ISO 2022-JP is "\x1B\x28\x42", "\x1B\x28\x4A", "\x1B\x24\x40", "\x1B\x24\x42". ASCII graphic representations such as "\e$B" are not portable to EBCDIC, nevertheless they are widely used in the ASCII world. In EBCDIC, ESCAPE "\e" is not \x1B but \x27, DOLLAR $ is not \x24 but \x5B, CAPITAL B is not \x42 but \xC2. Don't replace escape sequences with corresponding graphic characters as ASCII. If I understand it correctly, an escape sequence is a sequence of 7-bit or 8-bit combinations, but not a sequence of graphic characters; an escape sequence is encoded neither in ASCII nor in EBCDIC. (Though I refer to JIS X 0202, standard Japanese translation, instead of the original ISO/IEC 2022.) > I tested again with : > --------------------------------- > use encoding 'iso-2022-jp'; > $a = "$B&&(B"; # && is \x50\x50 on EBCDIC which is > valid acc to jis0208.ucm > print "a : $a\n"; > ---------------------------------- > > But I still get the messages as above and some junk > data in $a which I dont think is the correct o/p. As Encode.pm is a CPAN module, perhaps bugs in it should be reported to the maintainer of the module, rather than the perl5-porters mailing list. The site rt.cpan.org helps to report bugs in every distribution released through CPAN: http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode Regards, SADAHIRO Tomoyuki