Hi there,
The background of reaching out to you is: We're porting a huge Library
Management System from support of DBS Sybase and Oracle to PostgreSQL.
The software, some 10 million lines of code, is written in all
programming languages one can think of: C, C++, ESQL/C, Perl, Java....
One special problem we face at the moment is how DBD::Pg is handling
UTF-8 strings in the char columns in the database. The PG server is 11.4
on Linux and DBD::Pg is 3.10.0-3.
I connect to the PG server with something like this (for tests):
$dbh = DBI->connect($PGDB, $PGDB_USER, $PGDB_PASS,
{ pg_utf8_flag => 1,
pg_enable_utf8 => 1,
AutoCommit => 0,
RaiseError => 0,
PrintError => 0,
}
);
and do a SELECT for a column which contains UTF-8 data (I double checked
this with SQL and ::bytea):
$sth=$dbh->prepare(
"select d02name from d02ben where d02bnr = '00001048313'")
or die "parse error\n".$DBI::errstr."\n";
$sth->execute
or die "exec error\n".$DBI::errstr."\n";
but when I now fetch the first row with:
@row = $sth->fetchrow_array;
$HexStr = unpack("H*", $row[0]);
print "HexStr: " . $HexStr . "\n";
print "$row[0]\n";
The resulting column contains ISO 8859-1 data:
HexStr:
50e46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020
P<E4>dagogische Hochschule Weingarten
Following the man page of DBD::Pg the attribute pg_enable_utf8 => 1
should ensure that strings are returned from DBI with the UTF-8 flag
switched on. The server sends the string in UTF-8 as I can see with
strace (see the chars P\303\244dagogische...):
...
recvfrom(3, "T\0\0\0
\0\1d02name\0\0\1\313\237\0\3\0\0\4\22\377\377\0\0\0|\0\0D\0\0\0\203\0\1\0\0\0yP\303\244dagogische
Hochschule Weingarten
C\0\0\0\rSELECT 1\0Z\0\0\0\5T", 16384, 0, NULL,
NULL) = 185
write(1, "HexStr:
50e46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020\n",
249) = 249
write(1, "P\344dagogische Hochschule Weingarten
...
But why it gets translated to ISO? What do we wrong?
Thanks,
matthias
--
Matthias Apitz, ✉ [email protected], http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub