I hope I'm not muddying the waters but dbd::mysql UTF support for returned data (not metadata) seems to be nearly there. I need UTF8 support on at least inserted and returned results-sets although I'm less bothered by UTF8 table/column names. It would seem that if you define your table as using UTF8 then insertion is not a problem but retrieval is. The following code nearly works - it is just the setting of the utf8 flag on the returned data that is wrong:
#!/usr/bin/perl -w use strict; use DBI qw(:utils); use charnames ':full'; use Encode; print "Is utf8::is_utf8 defined: ", defined &utf8::is_utf8, "\n"; print "Is utf8::valid defined: ", defined &utf8::valid, "\n"; my $str = "\x{263a}xxx" . chr(0x05d0) . "\N{ARABIC LETTER ALEF}"; # smiley print join(" ", unpack("H*", $str)), "\n"; print "length(str) = ", length($str), "\n"; print "bytes::length(str) = ", bytes::length($str), "\n"; print "utf8::is_utf8 = ", utf8::is_utf8($str) ? 1 : 0, "\n"; print "data_string_desc: ", data_string_desc($str),"\n"; open OUT, ">uni.out"; binmode(OUT, ":utf8"); print OUT "$str\n"; # data written to uni.out is UTF8 my $dbh = DBI->connect("dbi:mysql:test", "xxx", "xxx"); # there are posts on dbi-user as to whether both or either of # the following should be set $dbh->do("set character set utf8"); $dbh->do("set names utf8"); $dbh->do("drop table if exists utf"); $dbh->do("create table utf (a char(100)) default charset utf8"); my $sth = $dbh->prepare("insert into utf values (?)"); $sth->execute($str); $sth = $dbh->prepare("select * from utf"); $sth->execute; my @row = $sth->fetchrow_array; print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\n"; # the following shows we'e got the right data back # but perl does not know it is utf8 print join(" ", unpack("H*", $row[0])), "\n"; # turning on utf8 causes the rignt uf8 sequence to be output # and hence sv_utf8_upgrade(sv) will probably work Encode::_utf8_on($row[0]); print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\n"; open OUT, ">utf.out"; binmode (OUT, ":utf8"); print OUT $row[0]; close OUT; # data written to utf.out is not UTF8 unless is marked utf8 produces: Is utf8::is_utf8 defined: 1 Is utf8::valid defined: 1 e298ba787878d790d8a7 length(str) = 6 bytes::length(str) = 10 utf8::is_utf8 = 1 data_string_desc: UTF8 on, non-ASCII, 6 characters 10 bytes data_string_desc (after fetch): UTF8 off, non-ASCII, 10 characters 10 bytes e298ba787878d790d8a7 with utf.out containing: <C3><A2><C2><98><C2><BA>xxx<C3><97><C2><90><C3><98><C2><A7> without that Encode::_utf8_on($row[0]); Michael Kröll (apr-10-2006) posted a change for DBD::db2 which seemed to sort this out for DB2 so a similar change could be added to mysql. Hope this helps. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com On 24-Apr-2006 Tim Bunce wrote: > [I'm at the mysql conference and Patrick asked me about adding utf8 > support to DBD::mysql. I said I'd look at the libmysql docs and give my > thoughts. I'm posting to dbi-dev since it may be of interest to others > interested in enhancing DBD::mysql and to other driver developers. > These are just random thoughts from a quick look at the docs.] > > The keys mysql docs seem to be > http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html > > The mysql api and client->server protocol doesn't support passing > characterset info to the server on a per-statement / per-bind value basis. > (http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.htm > l) > So the sane way to send utf8 to the server is by setting the 'connection > character set' to utf8 and then only sending utf8 (or its ASCII subset) > to the server on that connection. > > *** Fetching data: > > MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELD structure. > It's the "character set number for the field". > > So set the UTF8 flag based on that value. Something like: > (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv); > I couldn't see any docs for the values of the charsetnr field. > > Also, would be good to enable perl code to access the charsetnr values: > $sth->{mysql_charsetnr}->[$i] > > *** Fetching Metadata: > > The above is a minimum. It doesn't address metadata like field names > ($sth->{NAME}) that might also be in utf8. For that the driver needs to > know if the 'connection character set' is currently utf8. > > (The docs mention mysql->charset but it's not clear if that's part of > the public API.) > > However it's detected, the code needs to end up doing: > (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv); > on the metadata. > > > *** SET NAMES '...' > > Intercept SET NAMES and call the mysql_set_character_set() API instead. > See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html > > > *** Detecting Inconsistencies > > If the connection character set is _not_ utf8 but the application calls > the driver with data (or SQL statement) that has the UTF8 flag set, then > it could issue a warning. In practice that may be to be too noisy for > people that done their own workarounds for utf8 support. If so then > they could be changes to level 1 trace messages. > > If the connection character set _is_ utf8, and the application calls > the driver with data (or SQL statement) that does _not_ have the UTF8 > flag set but _does_ have bytes with the high bit set, then the driver > should issue a warning. The checking for high bit set is an extra cost > so this should only be enabled if tracing and/or an attribute is set > (perhaps called $dbh->{mysql_charset_checks} = 1) > > Tim.