On Tue, 13 Aug 2002 14:09:37 +0200 "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote:
> Hi all, > > I have a perl application (perl 5.8.0) which puts utf8 data in a mysql > database. This seems to work pretty well, and the retrieving of the data > with perl also works. Using something like this: > > my $sth = $db_handle->prepare("SELECT some query"); > $sth->execute; > my @row=$sth->fetchrow_array; > print $row[0]."\n"; #### print before > if ($]>5.007){ > require Encode; > Encode::_utf8_on($row[0]);} > print $row[0]."\n"; #### print after > $sth->finish; > > The Encode utf8_on gives me back good data. As far as i understood the > _utf8_on method doesnt do any real conversions, but only switches the utf > flag of a perl string? > > If you compare the two prints in above example, then it seems that after the > utf flag is set the string is utf decoded. This results in the correct > string, so it seems the original string is utf encoded (double encoded, > since it already was UTF). > > When i select the same string manually (mysql prompt) or with PHP, then i > get back the double encoded string. So it seems to me that the double > encoded format is how perl stores it internally (and also in the database)? > But this doesnt sound right to me...this would mean that everytime a utf > flagged string is used it would need to be utf decoded. That sounds not very > effecient to me, so i doubt its done that way. But meanwhile i have no idea > how its done...and how its stored in the database. > > As you might have guessed i want to access the data i put in the database > with PHP, but i get back double utf encoded data there. The problem could be > in alot of different places, for example my fetching in PHP, storing in perl > and maybe somewhere else where i have some faulty conversion. To check if > the data in the database is correct i tried to figure out how perl works > with the data. > > Maybe someone could put me on the right track, because this got me mighty > confused ;-) To look what Perl's scalar holds, use Devel/Peek.pm. #!perl use Devel::Peek; use Encode; our $camel_utf8 = "\351\247\261\351\247\235"; print STDERR "* _utf8_on\n\n"; Encode::_utf8_on($camel_utf8); Dump($camel_utf8); print STDERR "\n"; print STDERR "* _utf8_off\n\n"; Encode::_utf8_off($camel_utf8); Dump($camel_utf8); __END__ The output is like this. The difference between _on and _off is found in FLAGS. * _utf8_on SV = PV(0x1661c60) at 0x166cccc REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x16db4e0 "\351\247\261\351\247\235"\0 [UTF8 "\x{99f1}\x{99dd}"] CUR = 6 LEN = 7 * _utf8_off SV = PV(0x1661c60) at 0x166cccc REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x16db4e0 "\351\247\261\351\247\235"\0 CUR = 6 LEN = 7 SADAHIRO Tomoyuki