On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote: > > For the code points being tested > > ("\x{0442}\x{0435}\x{0441}\x{0442}") > > does the perl source file contain the correct byte > > sequence in UTF-EBCDIC? > Yes it does, since I ran the test, > if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq > ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'})) > print "ok\n"; > and the test ran fine, if that is what you mean by the > source file containing the correct byte sequence. Or > am I mistaken ?
You are mistaken, I'm afraid. bareword means no quotes. In ASCII & UTF-8 land, the 1 liner $ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a' gives 3500 The 3 bytes in the source code between '{' and '}' are 224, 182 and 172 which are the UTF-8 encoding for the code point 3500. My question is, what are the bytes in UTF-EBCDIC that encode code point 3500? If you put those 3 bytes directly between the '{' and '}' characters in the EBCDIC version of that 1 liner, does it also print 3500? > > If so, *that* would explain the failures, and be the > > thing that needs > > correcting. The test file would need if/else with a > > different test on EBCDIC. > what would you suggest be put in the if/ else ? I think that the regression tests tended to do something like if (ord 'A' == 65) { # Do the ASCII/UTF-8 version } else { # Assume EBCDIC } Nicholas Clark