Hi,
sorry, the mail got long, but it took me some time to reproduce the
problematic behaviour and I think we need this detail level to
understand what's going wrong here...
We're using Perl 5.8.0, DBI 1.30 and DBD::Oracle 1.12, both client and
server on Linux. We just extend our application framework to support
UTF8 seamlessly. Nearly everything works well. Only retrieving CLOB's
from an Oracle 8.1.7 database, which was created with the character set
"german_germany.utf8", fails.
With this small script I can reproduce the error:
--snip--
#!/usr/bin/perl
require 5.008;
use strict;
use warnings;
use DBI;
use DBD::Oracle qw(:ora_types);
use Encode;
$ENV{NLS_LANG} = "german_germany.utf8";
main: {
my $dbh = DBI->connect (
"dbi:Oracle:DB",
"USER", "PASS",
{
AutoCommit => 1,
RaiseError => 1,
PrintError => 0,
LongTruncOk => 1,
}
);
eval { $dbh->do ("drop table clob_test") };
$dbh->do (
"create table clob_test (
id integer,
str clob,
txt varchar2(255)
)"
);
my $id = 0;
my $data = "some data ���";
# INSERT -----------------------------------------------------
my $sth = $dbh->prepare ("insert into clob_test values (?, ?, ?)");
utf8::upgrade($data);
{ use bytes; print "put: '$data'\n" }
$sth->bind_param (1, ++$id );
$sth->bind_param (2, $data, { ora_type => ORA_CLOB } );
$sth->bind_param (3, $data );
$sth->execute;
$sth->finish;
# READ -------------------------------------------------------
$sth = $dbh->prepare (
"select txt, str
from clob_test
where id = ?"
);
$sth->execute ( $id );
my ($txt) = $sth->fetchrow;
my $chunk = $sth->blob_read (1, 0, 32768);
Encode::_utf8_on ($chunk);
{ use bytes; print "got: '$chunk' / '$txt'\n" }
print "clob result: ".($chunk eq $data ? "OK" : "NOT OK"),"\n";
print "char result: ".($txt eq $data ? "OK" : "NOT OK"),"\n";
END { $dbh->disconnect if $dbh }
}
--snip--
The $sth->blob_read line throws the following exception:
DBD::Oracle::st blob_read failed: (DBD NEED_DATA: OCILobRead)
But when I omit the $ENV{NLS_LANG} line, $sth->blob_read() works and the
script doesn't abort. Astonishingly even the result check at the end
reports, that the clob variable is valid utf8-encoded! That means,
Oracle omitted iso-latin-1/utf8 conversion, which is usually done, if
the client isn't itself utf8-aware.
But exactly that happens to the varchar2 variable, which is reported as
wrong by the script (sure, because it was utf8-converted twice). This is
the script's output (without setting NLS_LANG, in the other case I get
the exception quoted above):
put: 'some data äüö' / 'some data äüö'
got: 'some data äüö' / 'some data �¤�¼�¶'
clob result: OK
char result: NOT OK
I learn the following from this:
a) blob_read() always throws an exception with
NLS_LANG=german_germany.utf8
b) as expected: when not setting NLS_LANG, Oracle converts
values for character columns as expected from/to utf8
c) but: when not setting NLS_LANG, Oracle does *not*
utf8 convert any CLOB values
I think we should classify a) and c) as bugs, and b) as absolutely
correct ;)
Does anybody has any hint what's going on here? Can DBD::Oracle can do
something for me? Is it a (probably known) problem with the Oracle
client libaray?
Also I would greatly apprectiate if future versions of DBI would take
care of variable's utf8 states themself (converting to utf8 if necessary
and setting the utf8-flag for retrieved values - both only with Perl
5.8.x and if the database encoding is utf8, or if the developer set an
appropriate attribute or something like that).
Regards,
Joern
--
Joern Reder -- Software Development, dimedis GmbH, 50672 Koeln
http://www.dimedis.de/
supporting: http://www.zyn.de/ http://www.exit1.org/
CPAN: http://www.perl.com/CPAN/modules/by-module/CIPP/JRED/