Re: 2 Suprises w/5.8.0

2002-08-01 Thread Andreas J. Koenig

> On Thu, 1 Aug 2002 06:33:07 +0300, Jarkko Hietaniemi <[EMAIL PROTECTED]> said:

  > Pre-5.8 way of Unicode (or, even worse, pre-5.6 way of Unicode) simply
  > is not compatible, and trying to bridge the gap is probably worse than
  > its worth.

I agree with Jarkko if you write new code. But for old code the answer
must be different.

I guess, Daniel has code that works under pre-5.8 and he now wants to
have it run under 5.8 without breaking the compatibility to previous
perl. The reason is easy to understand: you cannot port code to 5.8
with one strike, it can take several months until you have found all
spots in your code that need some change. So he needs to keep
compatibility with older perl until he can switch to 5.8 safely.

I've ported the code of PAUSE to 5.8.0 within a few hours, but just
yesterday I discovered a missing encode_utf8(). Took me many hours to
find it. I was glad that I could run the whole PAUSE under 5.6.1.

Daniel, if this is the background of your request, I'd say:

- keep using Unicode::String
- keep using the utf8 pragma if 5.6.1 needed it
- don't throw away old code until you feel really safe 
- enclose all changes you try out for 5.8.0 into

if ($[ > 5.007){
  # code that isn't understood by 5.6.1
}

- don't hesitate to ask for practical advice on this list.

These are typical changes that you might need:

A filehandle that should read or write UTF-8:

  if ($] > 5.007) {
binmode $fh, ":utf8";
  }

A scalar that is going to be passed to some extension, be it
Compress::Zlib, Apache::Request or any extension that has no mention
of Unicode in the manpage:

  if ($] > 5.007) {
require Encode;
utf8::upgrade($self->{CONTENT}); # make sure it is UTF-8 encoded
$self->{CONTENT} = Encode::encode_utf8($self->{CONTENT}); # make octets
  }

A scalar we got back from an extension of which we believe it comes
back as UTF-8:

  if ($] > 5.007) {
require Encode;
$val = Encode::decode_utf8($val);
  }

Same thing, if you are really sure, it is UTF-8:

  if ($] > 5.007) {
require Encode;
Encode::_utf8_on($s);
  }

A wrapper-function for fetchrow_array and fetchrow_hashref when the
database contains only UTF-8:

  sub fetchrow {
my($self,$sth,$what) = @_; # $what is one of fetchrow_{array,hashref}
if ($] < 5.007) {
  return $sth->$what;
} else {
  require Encode;
  if (wantarray) {
my @arr = $sth->$what;
for (@arr) {
  defined && /[^\000-\177]/ && Encode::_utf8_on($_);
}
return @arr;
  } else {
my $ret = $sth->$what;
if (ref $ret) {
  for my $k (keys %$ret) {
defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret->{$k};
  }
  return $ret;
} else {
  defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret;
  return $ret;
}
  }
}
  }


If you have large scalars that you know can only contain ASCII and
might be marked as UTF-8:

  utf8::downgrade($sort) if $] > 5.007;

That's all I needed. You are not alone:-)


-- 
andreas



Re: 2 Suprises w/5.8.0

2002-08-01 Thread Nick Ing-Simmons

Andreas J. Koenig <[EMAIL PROTECTED]> writes:
>A scalar that is going to be passed to some extension, be it
>Compress::Zlib, Apache::Request or any extension that has no mention
>of Unicode in the manpage:
>
>  if ($] > 5.007) {
>require Encode;
>utf8::upgrade($self->{CONTENT}); # make sure it is UTF-8 encoded
 ^^^

Why is that step necessary? encode_utf8() should do that itself on the way ...

>$self->{CONTENT} = Encode::encode_utf8($self->{CONTENT}); # make octets
>  }
>
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/




Re: 2 Suprises w/5.8.0

2002-08-01 Thread Andreas J. Koenig

> On Thu, 01 Aug 2002 09:22:52 +0100, Nick Ing-Simmons 
><[EMAIL PROTECTED]> said:

  > Andreas J. Koenig <[EMAIL PROTECTED]> writes:
 >> A scalar that is going to be passed to some extension, be it
 >> Compress::Zlib, Apache::Request or any extension that has no mention
 >> of Unicode in the manpage:
 >> 
 >> if ($] > 5.007) {
 >> require Encode;
 >> utf8::upgrade($self->{CONTENT}); # make sure it is UTF-8 encoded
  >  ^^^

  > Why is that step necessary? encode_utf8() should do that itself on the way ...

You're right, this code slipped in some day and isn't needed at all.
Thanks for the correction!

-- 
andreas



Re: 2 Suprises w/5.8.0

2002-08-01 Thread Jarkko Hietaniemi

Excellent, thanks Andreas.  I see "cookbook" like this as a patch for
perluniintro/perlunicode for 5.8.1.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: 2 Suprises w/5.8.0 (Unicode)

2002-08-01 Thread Tim Bunce

On Thu, Aug 01, 2002 at 10:09:58AM +0200, Andreas J. Koenig wrote:
> 
> A wrapper-function for fetchrow_array and fetchrow_hashref when the
> database contains only UTF-8:
> 
>   sub fetchrow {
> my($self,$sth,$what) = @_; # $what is one of fetchrow_{array,hashref}
> if ($] < 5.007) {
>   return $sth->$what;
> } else {
>   require Encode;
>   if (wantarray) {
> my @arr = $sth->$what;
> for (@arr) {
>   defined && /[^\000-\177]/ && Encode::_utf8_on($_);
> }
> return @arr;
>   } else {
> my $ret = $sth->$what;
> if (ref $ret) {
>   for my $k (keys %$ret) {
> defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret->{$k};
>   }
>   return $ret;
> } else {
>   defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret;
>   return $ret;
> }
>   }
> }
>   }

Umm. Of course that'll break if the driver start doing it itself.

Tim.



Re: 2 Suprises w/5.8.0 (Unicode)

2002-08-01 Thread Andreas J. Koenig

> On Thu, 1 Aug 2002 15:50:08 +0100, Tim Bunce <[EMAIL PROTECTED]> said:

  > Umm. Of course that'll break if the driver start doing it itself.

That's why perlunicode.pod recommends wrapper functions for the time
from now till extensions start providing their own convenient methods.
The wrapper function can then be extended to also fit for some
advanced version of the extension it wraps.

I have no better solution, do you?

-- 
andreas



Re: 2 Suprises w/5.8.0 (Unicode)

2002-08-01 Thread Tim Bunce

On Thu, Aug 01, 2002 at 06:47:54PM +0200, Andreas J. Koenig wrote:
> > On Thu, 1 Aug 2002 15:50:08 +0100, Tim Bunce <[EMAIL PROTECTED]> said:
> 
>   > Umm. Of course that'll break if the driver start doing it itself.
> 
> That's why perlunicode.pod recommends wrapper functions for the time
> from now till extensions start providing their own convenient methods.
> The wrapper function can then be extended to also fit for some
> advanced version of the extension it wraps.
> 
> I have no better solution, do you?

Umm, actually, since Encode::_utf8_on() is _just_ turning on the utf8 flag
then it won't 'break' if the driver has already done that itself.

Tim.