On Wed, 31 Aug 2005 19:53:37 +0530, Sastry <[EMAIL PROTECTED]> wrote

> Hi Sadahiro
>   The patch has resolved four tests that were failing previously but one 
> more test is stilling failing(which was failing even before applying the 
> patch).
>  Here is the test case
>  
> ($a = v300.196.172.302.197.172) =~ tr/\x{12c}-\x{130}/\xc0-\xc4/;
> is($a, v192.196.172.194.197.172, 'UTF range');
>  # got 'DÐDEÐ'
> # expected '{DÐBEÐ'
>  Can you suggest some pointers towards fixing this?
>  -Sastry

This "EBCDIC-specific" problem is based on how to treat with code values
including Unicode (\x{12c}-\x{130} is surely Unicode) on EBCDIC platform.
Native code values in EBCDIC (for example 'A' == 193) almost differs
from the range of 0..255 in Unicode (for example 'A' == 65) which
coincides with ASCII/Latin1.

Thus the middle part of a character range is gererally different
between EBCDIC and Unicode.

For example consider a character range \xc0-\xc4. Since the mappings
\xc0 to '{' (an open curly) and \xc4 to D in EBCDIC are definite,
the range \xc0-\xc4 is equivalent to {-D on EBCDIC platform.

In EBCDIC {-D (\xc0-\xc4) can be expanded to \xc0\xc1\xc2\xc3\xc4,
but in Unicode {-D cannot be expanded, as the Unicode scalar values
of the endpoints are reverse ('{' = U+007B, D = U+0044).

Actually the current perl implementation is confused:
in the parse time (see toke.c#scan_const) perl treats the range
in EBCDIC order and then does not catch as "Invalid range,"
though in the compile time (see op.c#pmtrans) and the run time
(see doop.c#do_trans_simple_utf8 and its friends) perl treats
the range in Unicode order and then generates a strange result.

In my opinion it is necessary to determine how to expand character
ranges with Unicode (whether the native EBCDIC order or Unicode order).
I'm not sure using the native encoding (ASCII/Latin1/EBCDIC) everytime
(that is same as "no Unicode within 0..255") makes people happy.

Regards,
SADAHIRO Tomoyuki


Reply via email to