RE: Problems retrieving CLOB's from Oracle in UTF8

Orekhova, Irina Fri, 18 Oct 2002 10:25:07 -0700

Hi, I had the same error retrieving CLOB data when I set client's NLS_LANG
to AMERICAN_AMERICA.UTF8.


Here is the part of the trace with the error:

OCILobGetLength(026F74D4,026F776C,026F68B4,0140FAC8)=SUCCESS
OCILobRead(026F74D4,026F776C,026F68B4,0140FACC,1,01CE0B04,485,00000000,00000
000,0,1)=NEED_DATA
       OCILobRead field 2 NEED_DATA: LOBlen 485, LongReadLen 4000, BufLen
485, Got 161
OCIErrorGet(026F776C,1,"<NULL>",0140F964,"��@a",1024,2)=NO_DATA
        0 (rc=0): undef
    !! ERROR: 99 ' (DBD NEED_DATA: OCILobRead)'
    <- fetchrow_arrayref= undef row1 at test_clob.pl line 11
    -> disconnect for DBD::Oracle::db (DBI::db=HASH(0x1ce3a88)~0x1ce39c8)

Looks like OCI function got the length of the LOB field correctly, but by
some reasons couldn't read all bytes (got stuck on 161 out of 485). 

I posted a message about it some time ago with the details. 
I'd be happy if somebody could help to find out what's going on with the
combination (CLOB+UTF8).

Irina.

-----Original Message-----
From: Joern Reder [mailto:joern@;dimedis.de] 
Sent: Friday, October 18, 2002 11:39 AM
To: [EMAIL PROTECTED]
Subject: Problems retrieving CLOB's from Oracle in UTF8



Hi,

sorry, the mail got long, but it took me some time to reproduce the 
problematic behaviour and I think we need this detail level to 
understand what's going wrong here...

We're using Perl 5.8.0, DBI 1.30 and DBD::Oracle 1.12, both client and 
server on Linux. We just extend our application framework to support 
UTF8 seamlessly. Nearly everything works well. Only retrieving CLOB's 
from an Oracle 8.1.7 database, which was created with the character set
"german_germany.utf8", fails.

With this small script I can reproduce the error:

--snip--
  #!/usr/bin/perl

  require 5.008;

  use strict;
  use warnings;

  use DBI;
  use DBD::Oracle qw(:ora_types);
  use Encode;

  $ENV{NLS_LANG} = "german_germany.utf8";

  main: {
      my $dbh = DBI->connect (
              "dbi:Oracle:DB",
              "USER", "PASS",
              {
                      AutoCommit  => 1,
                      RaiseError  => 1,
                      PrintError  => 0,
                      LongTruncOk => 1,
              }
      );

      eval { $dbh->do ("drop table clob_test") };

      $dbh->do (
              "create table clob_test (
                      id  integer,
                      str clob,
                      txt varchar2(255)
              )"
      );

      my $id = 0;

      my $data = "some data ���";

      # INSERT -----------------------------------------------------

      my $sth = $dbh->prepare ("insert into clob_test values (?, ?, ?)");

      utf8::upgrade($data);

      { use bytes; print "put: '$data'\n" }

      $sth->bind_param (1, ++$id );
      $sth->bind_param (2, $data, { ora_type => ORA_CLOB } );
      $sth->bind_param (3, $data );
      $sth->execute;
      $sth->finish;

      # READ -------------------------------------------------------

      $sth = $dbh->prepare (
              "select txt, str
               from   clob_test
               where  id = ?"
      );

      $sth->execute ( $id );

      my ($txt) = $sth->fetchrow;
      my $chunk = $sth->blob_read (1, 0, 32768);

      Encode::_utf8_on ($chunk);

      { use bytes; print "got: '$chunk' / '$txt'\n" }

      print "clob result: ".($chunk eq $data ? "OK" : "NOT OK"),"\n";
      print "char result: ".($txt   eq $data ? "OK" : "NOT OK"),"\n";

      END { $dbh->disconnect if $dbh }
  }

--snip--

The $sth->blob_read line throws the following exception:

  DBD::Oracle::st blob_read failed:  (DBD NEED_DATA: OCILobRead)

But when I omit the $ENV{NLS_LANG} line, $sth->blob_read() works and the
script doesn't abort. Astonishingly even the result check at the end 
reports, that the clob variable is valid utf8-encoded! That means, 
Oracle omitted iso-latin-1/utf8 conversion, which is usually done, if 
the client isn't itself utf8-aware.

But exactly that happens to the varchar2 variable, which is reported as
wrong by the script (sure, because it was utf8-converted twice). This is the
script's output (without setting NLS_LANG, in the other case I get 
the exception quoted above):

  put: 'some data äüö' / 'some data äüö'
  got: 'some data äüö' / 'some data �¤�¼�¶'
  clob result: OK
  char result: NOT OK

I learn the following from this:

  a) blob_read() always throws an exception with
     NLS_LANG=german_germany.utf8

  b) as expected: when not setting NLS_LANG, Oracle converts
     values for character columns as expected from/to utf8

  c) but: when not setting NLS_LANG, Oracle does *not*
     utf8 convert any CLOB values

I think we should classify a) and c) as bugs, and b) as absolutely 
correct ;)

Does anybody has any hint what's going on here? Can DBD::Oracle can do 
something for me? Is it a (probably known) problem with the Oracle 
client libaray?

Also I would greatly apprectiate if future versions of DBI would take 
care of variable's utf8 states themself (converting to utf8 if necessary and
setting the utf8-flag for retrieved values - both only with Perl 
5.8.x and if the database encoding is utf8, or if the developer set an 
appropriate attribute or something like that).

Regards,

Joern

-- 
Joern Reder -- Software Development,  dimedis GmbH, 50672 Koeln
               http://www.dimedis.de/
supporting:    http://www.zyn.de/ http://www.exit1.org/
CPAN:          http://www.perl.com/CPAN/modules/by-module/CIPP/JRED/

RE: Problems retrieving CLOB's from Oracle in UTF8

Reply via email to