Hi Sadahiro
On 9/11/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: > > On Wed, 31 Aug 2005 19:53:37 +0530, Sastry <[EMAIL PROTECTED]> wrote > > > Hi Sadahiro > > The patch has resolved four tests that were failing previously but one > > more test is stilling failing(which was failing even before applying the > > patch). > > Here is the test case > > > > ($a = v300.196.172.302.197.172) =~ tr/\x{12c}-\x{130}/\xc0-\xc4/; > > is($a, v192.196.172.194.197.172, 'UTF range'); > > # got 'DÐDEÐ' > > # expected '{DÐBEÐ' > > Can you suggest some pointers towards fixing this? > > -Sastry > > This "EBCDIC-specific" problem is based on how to treat with code values > including Unicode (\x{12c}-\x{130} is surely Unicode) on EBCDIC platform. > Native code values in EBCDIC (for example 'A' == 193) almost differs > from the range of 0..255 in Unicode (for example 'A' == 65) which > coincides with ASCII/Latin1. > > Thus the middle part of a character range is gererally different > between EBCDIC and Unicode. > > For example consider a character range \xc0-\xc4. Since the mappings > \xc0 to '{' (an open curly) and \xc4 to D in EBCDIC are definite, > the range \xc0-\xc4 is equivalent to {-D on EBCDIC platform. > > In EBCDIC {-D (\xc0-\xc4) can be expanded to \xc0\xc1\xc2\xc3\xc4, > but in Unicode {-D cannot be expanded, as the Unicode scalar values > of the endpoints are reverse ('{' = U+007B, D = U+0044). Actually the current perl implementation is confused: > in the parse time (see toke.c#scan_const) perl treats the range > in EBCDIC order and then does not catch as "Invalid range," > though in the compile time (see op.c#pmtrans) and the run time > (see doop.c#do_trans_simple_utf8 and its friends) perl treats > the range in Unicode order and then generates a strange result. > > For this test since the min > max in scan_const, as per their Unicode > values, should we complain warning, in which case the test case is wrong in > EBCDIC platform! Am I correct? In my opinion it is necessary to determine how to expand character > ranges with Unicode (whether the native EBCDIC order or Unicode order). > I'm not sure using the native encoding (ASCII/Latin1/EBCDIC) everytime > (that is same as "no Unicode within 0..255") makes people happy. >Do you think that perl-5.8.6 is not expanding the character ranges with Unicode? If so how is this test case working? ($a = "\x{12d}\x{12e}\x{12f}\x{130}") =~ tr/\x{12c}-\x{130}/Y/; All the bytes are translated to Y regards -Sastry Regards, > SADAHIRO Tomoyuki > > >