Hi Sadahiro On 9/12/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: > > On Mon, 12 Sep 2005 16:12:45 +0530, Sastry <[EMAIL PROTECTED]> wrote > > > Hi Sadahiro > > > > > > On 9/11/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: > > > > > > Do you think that perl-5.8.6 is not expanding the character ranges with > > Unicode? If so how is this test case working? > > ($a = "\x{12d}\x{12e}\x{12f}\x{130}") =~ tr/\x{12c}-\x{130}/Y/; > > All the bytes are translated to Y > > regards > > -Sastry > > Beyond 255 (\x{ff}), I think it will be correctly expanded. > \x{12c}-\x{130} is beyond 255, and thus no problem. > > In the range of 0..255 (inclusive), I think "generally no" for EBCDIC. > (Why I don't say "always no" is that there are some cases where > a character range in EBCDIC coincides with that in Unicode: > for example 0-9 can be successfully expanded into 0123456789 > in both encodings) > > I attribute the failure in tr/\x{12c}-\x{130}/\xc0-\xc4/; to > such an ambiguity of \xc0-\xc4. In this expression the left part > \x{12c}-\x{130} parsed before coerces \xc0-\xc4 into Unicode, > and results in the failure. So this is still a problem on EBCDIC! Is there a way to fix this?
> > In contrast, I attribute the success in tr/\xc0-\xc4/\x{12c}-\x{130}/; > to that \xc0-\xc4 is parsed before \x{12c}-\x{130}, and then > \xc0-\xc4 is expanded into \xc0\xc1\xc2\xc3\xc4 as EBCDIC > before the character list is coerced into Unicode. > > > Well, how about the tese case B? (It has \x{100} at first and > then both sides are coerced into Unicode.) > > #test case A # now resolved > $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/; > is($c, 8); > is($a, "XXXXXXXX"); > > #test case B # On ASCII platform, of course successful > $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x{100}\x89-\x91/X/; > is($c, 8); > is($a, "XXXXXXXX"); This test fails on EBCDIC. In S_scan_const(), there is a statement below. /* Insert oct or hex escaped character. * There will always enough room in sv since such * escapes will be longer than any UTF-8 sequence * they can end up as. */ /* We need to map to chars to ASCII before doing the tests to cover EBCDIC */ if (!UNI_IS_INVARIANT(NATIVE_TO_UNI(uv))) { if (!has_utf8 && uv > 255) { on an ASCII , the first if condition is true as uv is 137 and it falls in the variant range as uv >\x7F whereas on EBCDIC the if condition is false. Can you explain why this behaviour is? Also I found that the characters are expanded during runtime in S_do_trans_simple_utf8() Do you have any suggestion where the problem is? > > I think the current perl on EBCDIC does not translate gap characters > for the test case B, which works like tr/\x{100}i-j/X/ > and results in $c == 2, and $a eq "X\x8a\x8b\x8c\x8d\x8f\x90X"; > because i's next character is j in Unicode. It expands the range but doesn't translate. > > And then try this: > #test case C # On ASCII platform, of course successful > $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91\x{100}/X/; > is($c, 8); > is($a, "XXXXXXXX"); This works fine > > I think the test case C would success even on EBCDIC, because > the expansion from \x89-\x91 to \x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91 > will be done before the parser finds \x{100}. > > Regards, > SADAHIRO Tomoyuki > > > regards Sastry --