Hi Sadahiro

On 9/12/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
> 
> On Mon, 12 Sep 2005 16:12:45 +0530, Sastry <[EMAIL PROTECTED]> wrote
> 
> > Hi Sadahiro
> > 
> > 
> >  On 9/11/05, SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
> > 
> > 
> > Do you think that perl-5.8.6 is not expanding the character ranges with 
> > Unicode? If so how is this test case working?
> >  ($a = "\x{12d}\x{12e}\x{12f}\x{130}") =~ tr/\x{12c}-\x{130}/Y/;
> > All the bytes are translated to Y
> >  regards
> > -Sastry
> 
> Beyond 255 (\x{ff}), I think it will be correctly expanded.
> \x{12c}-\x{130} is beyond 255, and thus no problem.
> 
> In the range of 0..255 (inclusive), I think "generally no" for EBCDIC.
> (Why I don't say "always no" is that there are some cases where
>  a character range in EBCDIC coincides with that in Unicode:
>  for example 0-9 can be successfully expanded into 0123456789
>  in both encodings)
> 
> I attribute the failure in tr/\x{12c}-\x{130}/\xc0-\xc4/; to
> such an ambiguity of \xc0-\xc4. In this expression the left part
> \x{12c}-\x{130} parsed before coerces \xc0-\xc4 into Unicode,
> and results in the failure.
So this is still a problem on EBCDIC! Is there a way to fix this?

> 
> In contrast, I attribute the success in tr/\xc0-\xc4/\x{12c}-\x{130}/;
> to that \xc0-\xc4 is parsed before \x{12c}-\x{130}, and then
> \xc0-\xc4 is expanded into \xc0\xc1\xc2\xc3\xc4 as EBCDIC
> before the character list is coerced into Unicode.
> 
> 
> Well, how about the tese case B? (It has \x{100} at first and
> then both sides are coerced into Unicode.)
> 
> #test case A # now resolved
> $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/;
> is($c, 8);
> is($a, "XXXXXXXX");
> 
> #test case B # On ASCII platform, of course successful
> $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x{100}\x89-\x91/X/;
> is($c, 8);
> is($a, "XXXXXXXX");
This test fails on EBCDIC.  In S_scan_const(), there is a statement below.
/* Insert oct or hex escaped character.
                 * There will always enough room in sv since such
                 * escapes will be longer than any UTF-8 sequence
                 * they can end up as. */
                
                /* We need to map to chars to ASCII before doing the tests
                   to cover EBCDIC
                */
                if (!UNI_IS_INVARIANT(NATIVE_TO_UNI(uv))) {
                                         if (!has_utf8 && uv > 255) {

on an ASCII , the first if condition is true as uv is 137  and it
falls in the variant range as uv >\x7F whereas on EBCDIC the if
condition is false. Can you explain why this behaviour is?
Also I found that the characters are expanded during runtime in
S_do_trans_simple_utf8()
Do you have any suggestion where the problem is?

> 
> I think the current perl on EBCDIC does not translate gap characters
> for the test case B, which works like tr/\x{100}i-j/X/
> and results in $c == 2, and $a eq "X\x8a\x8b\x8c\x8d\x8f\x90X";
> because i's next character is j in Unicode.
It expands the range but doesn't translate.

> 
> And then try this:
> #test case C # On ASCII platform, of course successful
> $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91\x{100}/X/;
> is($c, 8);
> is($a, "XXXXXXXX");
This works fine

> 
> I think the test case C would success even on EBCDIC, because
> the expansion from \x89-\x91 to \x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91
> will be done before the parser finds \x{100}.
> 

> Regards,
> SADAHIRO Tomoyuki
> 
> 
> 

regards
Sastry
--

Reply via email to