On Tue, 13 Aug 2002 14:09:37 +0200
"Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> I have a perl application (perl 5.8.0) which puts utf8 data in a mysql
> database. This seems to work pretty well, and the retrieving of the data
> with perl also works. Using something like this:
> 
> my $sth = $db_handle->prepare("SELECT some query");
> $sth->execute;
> my @row=$sth->fetchrow_array;
> print $row[0]."\n"; #### print before
> if ($]>5.007){
>   require Encode;
>   Encode::_utf8_on($row[0]);}
> print $row[0]."\n"; #### print after
> $sth->finish;
> 
> The Encode utf8_on gives me back good data. As far as i understood the
> _utf8_on method doesnt do any real conversions, but only switches the utf
> flag of a perl string?
> 
> If you compare the two prints in above example, then it seems that after the
> utf flag is set the string is utf decoded. This results in the correct
> string, so it seems the original string is utf encoded (double encoded,
> since it already was UTF).
> 
> When i select the same string manually (mysql prompt) or with PHP, then i
> get back the double encoded string. So it seems to me that the double
> encoded format is how perl stores it internally (and also in the database)?
> But this doesnt sound right to me...this would mean that everytime a utf
> flagged string is used it would need to be utf decoded. That sounds not very
> effecient to me, so i doubt its done that way. But meanwhile i have no idea
> how its done...and how its stored in the database.
> 
> As you might have guessed i want to access the data i put in the database
> with PHP, but i get back double utf encoded data there. The problem could be
> in alot of different places, for example my fetching in PHP, storing in perl
> and maybe somewhere else where i have some faulty conversion. To check if
> the data in the database is correct i tried to figure out how perl works
> with the data.
> 
> Maybe someone could put me on the right track, because this got me mighty
> confused ;-)

To look what Perl's scalar holds,
use Devel/Peek.pm.

#!perl
use Devel::Peek;
use Encode;

our $camel_utf8 = "\351\247\261\351\247\235";

print STDERR "* _utf8_on\n\n";
Encode::_utf8_on($camel_utf8);
Dump($camel_utf8);

print STDERR "\n";

print STDERR "* _utf8_off\n\n";
Encode::_utf8_off($camel_utf8);
Dump($camel_utf8);

__END__

The output is like this.
The difference between _on and _off is found in FLAGS.

* _utf8_on

SV = PV(0x1661c60) at 0x166cccc
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x16db4e0 "\351\247\261\351\247\235"\0 [UTF8 "\x{99f1}\x{99dd}"]
  CUR = 6
  LEN = 7

* _utf8_off

SV = PV(0x1661c60) at 0x166cccc
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x16db4e0 "\351\247\261\351\247\235"\0
  CUR = 6
  LEN = 7



SADAHIRO Tomoyuki

Reply via email to