On 31-Aug-2006 Daniel Zeman wrote: > Hi, > I wonder if anyone can help me. > > I am using > - Debian Linux > - Perl 5.8.8 > - DBI (I do not know how to figure out its version)
perl -MDBI -e 'print "$DBI::VERSION\n";' > - MySQL 5.0.22-Debian_3-log What DBD::mysql version are you using? perl -M'DBD::mysql' -e 'print "$DBD::mysql::VERSION\n";' > I want to store and handle data in UTF-8 but so far I have not been able > to force Perl/DBI to do so. > > I have created a table using > > my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER > SET utf8 COLLATE utf8_czech_ci;"; > $dbh->do($sql); > > I feeded the table with data using > > my $list_of_columns = join(", ", @names); > my $list_of_values = join(", ", map{"_utf8'$record->{$_}'"}(@names)); > my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES > ($list_of_values);"; > $dbh->do($sql); You would be better using bound parameters as then only the data can be utf-8. > I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1 and it > really looked like the data were stored in correct UTF-8. > > However, when I retrieve the data from Perl/DBI, something in the chain > (MySQL? the driver? DBI?) decides that another encoding (probably, > Latin1) would be better for me. It "converts" the strings from UTF-8 to > that encoding, which means, at the time the data arrives in my Perl > code, all the non-Latin1 characters have already been irrecoverably > converted to question marks. I would be happy to decode the data myself > but there is nothing I can do with the question marks. > > I am using the following code to retrieve the data: > > my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON > hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev"; > my $sqlobj = $dbh->prepare($sql); > $sqlobj->execute(); > while(my ($kod, $nazev) = $sqlobj->fetchrow_array()) > { > ... > } > > So far, the only workaround I have, is not to tell the DBI the data is > UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the > single quote), and use Encode; decode("utf8", ...) on anything I fetch > from the database. This way, the database never knows the data was a > UTF-8 text, treats the bytes as Latin1 characters and returns them > undisturbed. However, I cannot access the data using phpMyAdmin (unless > I en/decode UTF in my brain), the string lengths do not reflect the > reality etc. > > Is there a better way to do it? I think there must be some small stupid > locale-like setting telling the machine that I am a UTF guy. But the > settings I was able to come up with did not help and I actually have no > idea which part of the MySQL-driver-DBI-Perl chain is responsible. > > Any hints are welcome. > Thanks > > Dan I posted a small patch (in dbi-dev a couple of months ago) to a recent(ish) DBD::mysql which allows you to retrieve utf-8 data marked as utf-8 in perl. If is not a complete solution but works for most things. See http://www.nntp.perl.org/group/perl.dbi.dev/4548 It may have been included in the 3.0006_1 development release of DBD::mysql. You also need to look up "set character set utf8" and "set names utf8" in mysql documentation. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com