On Sun, 24 Nov 2002, Jarkko Hietaniemi wrote:

> > 1) x.0 release. I haven't seen a x.0 release of _any_ software I was
> >    willing to put the family jewels on without quite a bit of testing
> >    first.
> 
> So are you conducting testing?

Slowly, informally. Work schedules leave little time to explore 5.8, 
except like now when I am actually on 'vacation' (which is why it is a 
month since the original message).

> > 2) The very first machine I installed it on immediately had script
> >    breakage _specifically_ because the rather broken (IMHO) behavior
> >    re making the use of either 'use bytes' or 'binmode' mandatory
> 
> Could you please specify the circumstances of the breakage further?
> What got broken, what had to be changed?

Stripped of most irrelevant code and cleaned up slightly, this is
essentially what happened (with the necessary 'binmode' commented out just
to point to the change). Yes - I know about (and frequently use)
Image::Size, et al. This is a fragment of a script that is distributed
'standalone' and so could not depend on anything not distributed with Perl
5.005 to be present.

#!/usr/bin/perl -w

use strict;

my $file = '/home/snowhare/images/test.jpg';
my ($width,$height) = jpegsize($file);

print "width = $width, height = $height\n";
exit 0;

sub readfile {
    my ($filename)=@_;
    if (! open (NEWFILE,$filename)) {
        print STDERR "$filename could not be opened for reading\n$!";
        return;
    }
#    binmode NEWFILE;
    my ($savedreadstate) = $/;
    undef $/;
    my $data = <NEWFILE>;
    $/ = $savedreadstate;
    close (NEWFILE);

    return ($data);
}

sub jpegsize {
    my ($filename) = @_;

    my $jpeg = readfile($filename);

    my($count) = 2;
    my($length)= length($jpeg);
    my($ch)    = "";

    while (($ch ne "\xda") && ($count<$length)) {
        # Find next marker (jpeg markers begin with 0xFF)
        while (($ch ne "\xff") && ($count < $length)) {
            $ch=substr($jpeg,$count,1); 
            $count++;
        }
        # jpeg markers can be padded with unlimited 0xFF's
        while (($ch eq "\xff") && ($count<$length)) {
            $ch=substr($jpeg,$count,1); 
            $count++;
        }
        # Now, $ch contains the value of the marker.
        if ((ord($ch) >= 0xC0) && (ord($ch) <= 0xC3)) {
            $count          += 3;
            my ($a,$b,$c,$d) = unpack("C"x4,substr($jpeg,$count,4));
            my $width        = $c<<8 | $d;
            my $height       = $a<<8 | $b;
            return($width,$height);
        } else {
            # We **MUST** skip variables, since FF's within variable names are
            # NOT valid jpeg markers
            my ($c1,$c2)= unpack("C"x2,substr($jpeg,$count,2));
            $count += $c1<<8|$c2;
        }
    }   
}

> >    the last few years to 'magically' try to muck with charset encodings; 
> >    5.8.0 has specifically realized those fears as quite justified.
> 
> I'm sorry but you are not being very helpful at all.  You "distrust"
> "magic" but you do not really say what behaviour of Perl 5.8.0 you
> find disturbing.

Treating a 'string' as anything but a sequence of 'bytes/octets' _without 
my explicit request or a runtime warning that I haven't specified fh 
semantics_.

> The only obvious 'magic' I can think of is the behaviour where Perl
> checks your locale settings, and if they indicate use of UTF-8, Perl
> switches the default encoding of the STD* streams, and any further
> file opens to UTF-8.  This bit of magic was specificially requested by
> Larry Wall, and also by the Linux "Unicodification" project.

This is Bad Juju (tm). It _guarantees_ script breakage (potentially
silently!) for Unix people doing _anything_ but ASCII text manipulation.  

If you want to break something as fundamental to *nix boxes as binary mode
filehandles - _at least_ force the script writer acknowledge this _deep_
change to FH semantics. Then they are forced to become aware of the issue
_before_ a script gets its operating assumptions yanked out from under it.

I would lobby for a mandatory runtime warning to be issued on any
filehandle where neither 'binmode FH;' or 'binmode FH, LAYER;' has been
seen before a filehandle is used for the first time with an explanation of
the issue.

> The locale-induced UTF-8 magic can lead into situation where you have
> to explicitly mark your filehandles "binary" (with binmode, please
> don't use bytes), because otherwise any data going out would be
> expected to be Unicode, that is, *text*.  If you are pushing out
> binary bits and bytes, you should tell Perl about it.   You are
> also simultaneously complaining about "wanting to specify things
> yourself" and "having to use binmode"?

Yes. Because _needing_ to 'tell Perl' that I am pushing binary rather than
text _is a change_ for *nix platforms. I should have to 'tell Perl' I am
pushing _anything else_ than binary. Or _at a minimum_ a mandatory warning
should be issued that I didn't declare the filehandle's encoding layer and
it is now using encoding 'X' if I haven't explictly indicated that I
*WANT* the system environment changing my filehandle's encodings.

> Back to the 'UNIX' way of I/O: I'm sorry but I think the UNIX way and
> the Unicode can't transparently cohabit.  I'm very much a UNIX geek
> and systems programmer, and I like the simple symmetrical world of
> UNIX I/O, but I cannot see how the byte streams of UNIX and the
> multiple variable and fixed length encodings of Unicode can work
> simultaneously without some sort of explicit switching.

_Explict_ switching is what I am asking for. _Implicit_ switching is what
I am complaining about. If you want to switch based on the system env -
fine: _But at least warn me with a good immediate warnings_ before
changing my fh semantics if I haven't said something like
 
   binmode FH, ':crlf|:raw|:env';

before I go my $data = <FH>;

"Malformed UTF-8 character (unexpected end of string) at
./error-example.pl line 40." isn't useful: It is obscure and is produced
distantly from the actual breakage.

If I hadn't been lurking on the P5P and Perl-Unicode lists for the last
few years, I could have easily been tearing my hair out for hours trying
to a) Figure out what the hell it was talking about and b) Figure out a
work around.

-- 
Benjamin Franz

"If the code and the comments disagree, then both are probably wrong."
                                        -- Norm Schryer, Bell Labs 

Reply via email to