Thank you for the answer,

I did some experimenting with the Devel::Peek module and i found the cause
of my problem.
I was using the DBI $DBHANDLE->quote($astring); method to quote (and slash)
strings that i put in the database. Unfortunately this method is not unicode
safe, and my data got corrupted. It looks like the data gets utf encoded
twice. I wrote a temporary function to slash my data, but i would rather use
the DBI method if possible. I have the feeling that this problem can be
solved in some way, maybe someone can explain what is most likely causing
the problem, and if i can do something to make it unicode safe (without
having to modify the DBI module). If its not possible let me know too, then
i just keep the temp function i use now ;-)

Oh yeah, one other thing, since Encode::_utf8_on is a internal function,
wouldn't it be better to use Encode::decode("utf8",$somevar) instead? As far
as i can see, it should do exactly the same, but if i am mistaken, let me
know :)

Thank you,
Merijn van den Kroonenberg


----- Original Message -----
From: "SADAHIRO Tomoyuki" <[EMAIL PROTECTED]>
To: "Merijn van den Kroonenberg" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, August 15, 2002 3:12 PM
Subject: Re: perl, unicode and databases (mysql)


>
> On Tue, 13 Aug 2002 14:09:37 +0200
> "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > I have a perl application (perl 5.8.0) which puts utf8 data in a mysql
> > database. This seems to work pretty well, and the retrieving of the data
> > with perl also works. Using something like this:
> >
> > my $sth = $db_handle->prepare("SELECT some query");
> > $sth->execute;
> > my @row=$sth->fetchrow_array;
> > print $row[0]."\n"; #### print before
> > if ($]>5.007){
> >   require Encode;
> >   Encode::_utf8_on($row[0]);}
> > print $row[0]."\n"; #### print after
> > $sth->finish;
> >
> > The Encode utf8_on gives me back good data. As far as i understood the
> > _utf8_on method doesnt do any real conversions, but only switches the
utf
> > flag of a perl string?
> >
> > If you compare the two prints in above example, then it seems that after
the
> > utf flag is set the string is utf decoded. This results in the correct
> > string, so it seems the original string is utf encoded (double encoded,
> > since it already was UTF).
> >
> > When i select the same string manually (mysql prompt) or with PHP, then
i
> > get back the double encoded string. So it seems to me that the double
> > encoded format is how perl stores it internally (and also in the
database)?
> > But this doesnt sound right to me...this would mean that everytime a utf
> > flagged string is used it would need to be utf decoded. That sounds not
very
> > effecient to me, so i doubt its done that way. But meanwhile i have no
idea
> > how its done...and how its stored in the database.
> >
> > As you might have guessed i want to access the data i put in the
database
> > with PHP, but i get back double utf encoded data there. The problem
could be
> > in alot of different places, for example my fetching in PHP, storing in
perl
> > and maybe somewhere else where i have some faulty conversion. To check
if
> > the data in the database is correct i tried to figure out how perl works
> > with the data.
> >
> > Maybe someone could put me on the right track, because this got me
mighty
> > confused ;-)
>
> To look what Perl's scalar holds,
> use Devel/Peek.pm.
>
> #!perl
> use Devel::Peek;
> use Encode;
>
> our $camel_utf8 = "\351\247\261\351\247\235";
>
> print STDERR "* _utf8_on\n\n";
> Encode::_utf8_on($camel_utf8);
> Dump($camel_utf8);
>
> print STDERR "\n";
>
> print STDERR "* _utf8_off\n\n";
> Encode::_utf8_off($camel_utf8);
> Dump($camel_utf8);
>
> __END__
>
> The output is like this.
> The difference between _on and _off is found in FLAGS.
>
> * _utf8_on
>
> SV = PV(0x1661c60) at 0x166cccc
>   REFCNT = 1
>   FLAGS = (POK,pPOK,UTF8)
>   PV = 0x16db4e0 "\351\247\261\351\247\235"\0 [UTF8 "\x{99f1}\x{99dd}"]
>   CUR = 6
>   LEN = 7
>
> * _utf8_off
>
> SV = PV(0x1661c60) at 0x166cccc
>   REFCNT = 1
>   FLAGS = (POK,pPOK)
>   PV = 0x16db4e0 "\351\247\261\351\247\235"\0
>   CUR = 6
>   LEN = 7
>
>
>
> SADAHIRO Tomoyuki
>


Reply via email to