On Thu, May 23, 2002 at 11:32:12AM +0100, Greg McCarroll wrote:
> 
> Does anyone have any strong feelings (or even weak ones) about
> 5.005_03 vs 5.6.1. I'd be especially interested in any
> stories/feelings connected with Solaris or Oracle.

For starters the direct 5.6.0 dislike, but it also illustrates printing
in 5.6.x series:

$ cat S560crap.t
#!/usr/local/bin/perl5.6.0 -w
use strict;

my $byte = "é";

my $utf8 = $byte . chr 256;
chop $utf8;


if ($utf8 eq $byte) {
  printf "Yes\n";
} else {
  print "No\n";
  printf "%d byte of %d, %b byte of %d\n", length $utf8, ord $utf8,
    length $byte, ord $byte;
}

print "byte: $byte\n";
print "utf8: $utf8\n";
__END__

$ perl5.00503 S560crap.t 
Yes
byte: é
utf8: é
$ perl5.6.0 S560crap.t 
No
1 byte of 233, 1 byte of 233
byte: é
utf8: é
$ perl5.6.1 S560crap.t 
Yes
byte: é
utf8: é
$ perl5.7.3 S560crap.t 
Yes
byte: é
utf8: é


But for my general 5.6.1 dislike - I don't trust it. How to spot 5.6.1 from
quite a long way away:


$ cat spot_56x.pl
#!/usr/local/bin/perl5.6.1 -w
use strict;

my $byte = "é";

my $utf8 = $byte;
$utf8 .= chr 256; chop $utf8;

my %hash = ($utf8, "value");

my ($key) = keys %hash;

if ($key eq $utf8) {
  print "hash keys ok\n";
} else {
  print "hash keys not ok - put in $utf8 (ie ", $byte, "), got $key\n";
}

my $copy = $utf8;
$copy =~ s/././g;

if (length $copy == length $utf8) {
  print "regexp ok\n";
} else {
  print "regexp not ok - put in $utf8 (ie ", $byte, "), got $copy\n";
}

my $pid = open CHILD, "|-";
die "-| failed: $!" unless defined $pid;

if ($pid) {
  # Parent;
  print CHILD $utf8;
  close CHILD or die;
} else {
  my $io = <STDIN>;
  if ($io eq $utf8) {
    print "io ok\n";
  } else {
    print "io not ok - put in $utf8 (ie ", $byte, "), got $io\n";
  }
}
__END__

$ perl5.00503 spot_56x.pl
hash keys ok
regexp ok
io ok
$ perl5.6.1 spot_56x.pl
hash keys not ok - put in é (ie é), got é
regexp not ok - put in é (ie é), got ..
io not ok - put in é (ie é), got é
$ perl5.7.3 spot_56x.pl
hash keys ok
regexp ok
io ok


Of course, the inability of 5.6.1 to print Latin 1 store in Unicode makes
the diagnostic output a bit messy. And I needed ,$byte, to stop it getting
interpolated into utf8 and then garbled.

Basically, if any of your 8 bit data happens to get converted into utf8
by 5.6.1, it is likely to get mangled.

Nicholas Clark

Reply via email to