On Thu, Aug 04, 2005 at 11:42:54AM +0530, Sastry wrote: > Hi > > I am trying to run this script on an EBCDIC platform using perl-5.8.6 > > ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/; > is($a, "XXXXXXXX"); > > > The result I get is > > 'X«»ðý±°X' > > a) Is this happening since \x8a\x8b\x8c\x8d\x8f\x90 are the gapped > characters in EBCDIC ?
I think so. In that \x89 is 'i' and \x91 is 'j'. > b) Should all the bytes in $a change to X? I don't know. It seems to be some special case code in regexec.c: #ifdef EBCDIC /* In EBCDIC [\x89-\x91] should include * the \x8e but [i-j] should not. */ if (literal_endpoint == 2 && ((isLOWER(prevvalue) && isLOWER(ceilvalue)) || (isUPPER(prevvalue) && isUPPER(ceilvalue)))) { if (isLOWER(prevvalue)) { for (i = prevvalue; i <= ceilvalue; i++) if (isLOWER(i)) ANYOF_BITMAP_SET(ret, i); } else { for (i = prevvalue; i <= ceilvalue; i++) if (isUPPER(i)) ANYOF_BITMAP_SET(ret, i); } } else #endif which I assume is making [i-j] in a regexp leave a gap, but [\x89-\x91] not. I don't know where ranges in tr/// are parsed, but given that I grepped for EBCDIC and didn't find any analogous code, it looks like tr/\x89-\x91// is treated as tr/i-j// and in turn i-j is treated as letters and always "special cased" I don't know if tr/i-j// and tr/\x89-\x91// should behave differently (ie whether we currently have a bug) Nicholas Clark