Hello,

On Sun, Mar 15, 2009 at 1:35 PM, Greg Sabino Mullane <[email protected]> wrote:
>> When I fetch player names from the database above,
>> they don't seem to be recognized as UTF8:
>> ...
>> Can't DBD::Pg recognize that it's UTF8 data?
>
> You have not told us what version of DBI and DBD::Pg you
> are using. Please also provide a simple test case - it's
> hard to guess at what a program might be doing. Far better
> to provide some code.

ok, sorry. It's OpenBSD 4.3 + default perl and packages:

perl, v5.8.8 built for i386-openbsd
p5-DBD-Pg-1.49
p5-DBI-1.59
postgresql-client-8.2.6
postgresql-server-8.2.6

And here is my test case, the last printed line shows my problem:

http://pastebin.com/f6fc68309

$ cat dbi-utf.pl
#!/usr/bin/perl -w

use strict;
use utf8;
use DBI qw(:utils);
use Encode qw(encode_utf8 decode_utf8);

use constant HEARTS_HTML => pack ' U', 0x2665;
use constant X => 'phpbb';

my ($dbh, $ins1, $ins2, $sel1, $sel2, $href, $str1, $str2);

$dbh = DBI->connect('dbi:Pg:dbname=' . X, X, X, { RaiseError => 1});

$dbh->do('create table test1 (col1 integer, col2 varchar(50))');
$dbh->do('create table test2 (col3 integer, col4 text)');
$ins1 = $dbh->prepare('insert into test1 values (?, ?)');
$ins2 = $dbh->prepare('insert into test2 values (?, ?)');
$sel1 = $dbh->prepare('select * from test1 order by col1');
$sel2 = $dbh->prepare('select * from test2');

$ins1->execute(10, 'ABCDE');
$ins1->execute(20, 'АБВГД'); # the 1st 5 russian letters
$sel1->execute();

while ($href = $sel1->fetchrow_hashref()) {
        print "$href->{col1} $href->{col2}: " .
            data_string_desc($href->{col2}) . "\n";

        $str1 = "russian $href->{col2} russian";
        $str2 = HEARTS_HTML . "russian $href->{col2} russian" . HEARTS_HTML;
}

$ins2->execute(30, $str1);
$ins2->execute(40, $str2);
$sel2->execute();

while ($href = $sel2->fetchrow_hashref()) {
        print "$href->{col3} $href->{col4}: " .
            data_string_desc($href->{col4}) . "\n";
}

$dbh->do('drop table test1');
$dbh->do('drop table test2');

$ ./dbi-utf.pl
10 ABCDE: UTF8 off, ASCII, 5 characters 5 bytes
20 АБВГД: UTF8 off, non-ASCII, 10 characters 10 bytes
30 russian АБВГД russian: UTF8 off, non-ASCII, 26 characters 26 bytes
40 ♥russian Ð�Ð�Ð�Ð�Ð� russian♥: UTF8 off, non-ASCII, 42 characters 42 bytes

(the 3rd line is ok, but the last line is mangled)

I could test RHEL/CentOS 5.2 (at work) on Monday too

Regards
Alex

Reply via email to