Re: more DBD::Oracle utf8 weirdness, and kludge that should not have worked, but did

Tim Bunce Fri, 05 Nov 2004 15:45:53 -0800

On Fri, Nov 05, 2004 at 10:24:39AM -0800, Susan Cassidy wrote:
> Hi,
> Thanks Tim.
> 
> I'm not sure what you mean by " the _client_ CHAR and NCHAR character sets".
> How do I check?  (Obviously, I did not install the Oracle stuff myself, and
> we do not have our own DBA).


What are your NLS_LANG and NLS_NCHAR environment variables set to?

> By "validates as utf8", I meant that it is valid utf8 encoding (a).
> 
> By "did not match" I meant that " if $saved_data eq $retrieved_data "
> returns false.

Okay. I've no time to re-examine your original email (got to
sleep-n-pack for the MySQL conference in Frankfurt). From what I
said in reply and what you've here you, or someone else, ought to be
able to work out what's going on.

A hint: it's important that your client NLS_LANG and NLS_NCHAR
environment variables are set correctly, and that any UTF8 values
your're using have the UTF8 flag set.

Please reread the Unicode section of the DBD::Oracle docs.
Let me know if there's anything that's not clear enough.

Tim.

> Thanks,
> Susan
> 
> > -----Original Message-----
> > From: Tim Bunce [mailto:[EMAIL PROTECTED]
> > Sent: Friday, November 05, 2004 2:10 AM
> > To: Susan Cassidy
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: more DBD::Oracle utf8 weirdness, and kludge that should not
> > have worked, but did
> > 
> > On Thu, Nov 04, 2004 at 01:42:13PM -0800, Susan Cassidy wrote:
> > > I finally got my large, complex cgi/Oracle application working with
> > > DBD::Oracle 1.16, using database character set AL32UTF8, NLS_LANG=.UTF8,
> > > etc.
> > 
> > And what are the _client_ CHAR and NCHAR character sets?
> > And is the field you're inserting into a CHAR or NCHAR?
> > 
> > > The test program takes some English sentences, runs them through a
> > > translator (which produces utf8 output, works fine - data validates as
> > utf8
> > > on multiple systems, etc.).
> > 
> > It's important to keep in mind that "validates as utf8" is ambiguous.
> > 
> > It could mean *either or both* of:
> > 
> >     a) the sequence of is a valid utf8 encoding.
> >     b) the perl scalar value has the perl SvUTF8 flag turned on.
> > 
> > Much confusion is caused by not keeping those two separate points
> > in mind. It's important to be clear what you're thinking about,
> > and precise when communicating it to others.
> > 
> > > I then insert it into the database, and retrieve it.  The retrieved
> > > data did not match the translated data.
> > 
> > I'm afraid that "The retrieved data did not match the translated
> > data" is another ambiguous statement.
> > 
> > If a sequence of bytes that does not have the SvUTF8 flag turned
> > on is compared with the same sequence of bytes that does, they won't
> > match (unless the string is all ASCII).
> > 
> > Perl will encode the sequence of bytes that does not have the SvUTF8
> > flag turned on into UTF8 by treating each byte as a Latin1 character
> > (by default). If the sequence of bytes was UTF8 encoded already
> > (but not marked with the SvUTF8 flag) then treating each byte as a
> > Latin1 character will produce garbage unless the string is all ASCII.
> > 
> > So the two strings with the same sequence of bytes may not match!
> > 
> > > I added some tests in the code to check on the translated value like:
> > >     if (Encode::is_utf8($textval)) {
> > >       print "<p>&nbsp;is utf8!\n";
> > >     } else {
> > >       print "<p>&nbsp;is NOT utf8\n";
> > >     }
> > > This prints "is NOT utf8"  (when I know that it really is utf8).
> > 
> > Do you know which out of A and B above Encode::is_utf8 actually tests for?
> > Do you know which out of A and B you mean by "it really is utf8"?
> > 
> > > If I do the same thing to the retrieved data, it prints that the data IS
> > > utf8.
> > 
> > The returned data will be both valid utf8 and have the SvUTF8 flag on
> > if your relevant (CHAR/NCHAR) client character set is UTF8 or AL32UTF8.
> > 
> > But that doesn't mean it contains the same string you passed in! :)
> > So I trust you're also checking if $inserted_value eq $fetched_value.
> > 
> > > However, if I turn off the utf8 flag explicitly after retrieving the
> > data,
> > > before comparing the translated data with the retrieved data, it works:
> > 
> > Probably because you're now comparing byte strings as byte strings.
> > 
> > > Of course, where I print out the status of utf8 below this, it now says
> > it
> > > is NOT utf8.
> > 
> > Of course.
> > 
> > > I have re-read the Encode perldoc stuff several times.  It seems to be
> > > working (on my system) backwards, sort of?
> > >
> > > I the DBD::Oracle 1.16 docs, Tim says:
> > >       If the string passed to bind_param() is considered by perl to be a
> > >        valid utf8 string ( utf8::is_utf8($string) returns true ), then
> > >        DBD::Oracle will implicitly set csform SQLCS_NCHAR and csid
> > AL32UTF8
> > >        for you on insert.
> > > So, I think this may have something to do with it.  However, I am
> > > "unset"ting it after retrieval, not before inserting it. ????
> > 
> > But was it actually set on the value you inserted?
> > 
> > [FYI, the output from trace() quotes strings with the SvUTF8 flag
> > on with double quotes, and uses single quotes if SvUTF8 is off.
> > That's a quick way to see what's going on.]
> > 
> > > By the way, the same program moved over to a different machine where we
> > use
> > > PostgreSQL (DBD::Pg) (without the _utf8_off, of course)  works fine (as
> > I
> > > would expect).
> > 
> > I suspect DBD::Pg is doing something wrong that just happens to
> > work for your view of how it ought to work. Of course, I may be wrong.
> > 
> > Tim.
>

Re: more DBD::Oracle utf8 weirdness, and kludge that should not have worked, but did

Reply via email to