RE: more DBD::Oracle utf8 weirdness, and kludge that should not have worked, but did

Susan Cassidy Fri, 05 Nov 2004 10:27:09 -0800

Hi,
Thanks Tim.

I'm not sure what you mean by " the _client_ CHAR and NCHAR character sets".
How do I check?  (Obviously, I did not install the Oracle stuff myself, and
we do not have our own DBA).


By "validates as utf8", I meant that it is valid utf8 encoding (a).

By "did not match" I meant that " if $saved_data eq $retrieved_data "
returns false.

Thanks,
Susan

> -----Original Message-----
> From: Tim Bunce [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 05, 2004 2:10 AM
> To: Susan Cassidy
> Cc: [EMAIL PROTECTED]
> Subject: Re: more DBD::Oracle utf8 weirdness, and kludge that should not
> have worked, but did
> 
> On Thu, Nov 04, 2004 at 01:42:13PM -0800, Susan Cassidy wrote:
> > I finally got my large, complex cgi/Oracle application working with
> > DBD::Oracle 1.16, using database character set AL32UTF8, NLS_LANG=.UTF8,
> > etc.
> 
> And what are the _client_ CHAR and NCHAR character sets?
> And is the field you're inserting into a CHAR or NCHAR?
> 
> > The test program takes some English sentences, runs them through a
> > translator (which produces utf8 output, works fine - data validates as
> utf8
> > on multiple systems, etc.).
> 
> It's important to keep in mind that "validates as utf8" is ambiguous.
> 
> It could mean *either or both* of:
> 
>     a) the sequence of is a valid utf8 encoding.
>     b) the perl scalar value has the perl SvUTF8 flag turned on.
> 
> Much confusion is caused by not keeping those two separate points
> in mind. It's important to be clear what you're thinking about,
> and precise when communicating it to others.
> 
> > I then insert it into the database, and retrieve it.  The retrieved
> > data did not match the translated data.
> 
> I'm afraid that "The retrieved data did not match the translated
> data" is another ambiguous statement.
> 
> If a sequence of bytes that does not have the SvUTF8 flag turned
> on is compared with the same sequence of bytes that does, they won't
> match (unless the string is all ASCII).
> 
> Perl will encode the sequence of bytes that does not have the SvUTF8
> flag turned on into UTF8 by treating each byte as a Latin1 character
> (by default). If the sequence of bytes was UTF8 encoded already
> (but not marked with the SvUTF8 flag) then treating each byte as a
> Latin1 character will produce garbage unless the string is all ASCII.
> 
> So the two strings with the same sequence of bytes may not match!
> 
> > I added some tests in the code to check on the translated value like:
> >     if (Encode::is_utf8($textval)) {
> >       print "<p>&nbsp;is utf8!\n";
> >     } else {
> >       print "<p>&nbsp;is NOT utf8\n";
> >     }
> > This prints "is NOT utf8"  (when I know that it really is utf8).
> 
> Do you know which out of A and B above Encode::is_utf8 actually tests for?
> Do you know which out of A and B you mean by "it really is utf8"?
> 
> > If I do the same thing to the retrieved data, it prints that the data IS
> > utf8.
> 
> The returned data will be both valid utf8 and have the SvUTF8 flag on
> if your relevant (CHAR/NCHAR) client character set is UTF8 or AL32UTF8.
> 
> But that doesn't mean it contains the same string you passed in! :)
> So I trust you're also checking if $inserted_value eq $fetched_value.
> 
> > However, if I turn off the utf8 flag explicitly after retrieving the
> data,
> > before comparing the translated data with the retrieved data, it works:
> 
> Probably because you're now comparing byte strings as byte strings.
> 
> > Of course, where I print out the status of utf8 below this, it now says
> it
> > is NOT utf8.
> 
> Of course.
> 
> > I have re-read the Encode perldoc stuff several times.  It seems to be
> > working (on my system) backwards, sort of?
> >
> > I the DBD::Oracle 1.16 docs, Tim says:
> >       If the string passed to bind_param() is considered by perl to be a
> >        valid utf8 string ( utf8::is_utf8($string) returns true ), then
> >        DBD::Oracle will implicitly set csform SQLCS_NCHAR and csid
> AL32UTF8
> >        for you on insert.
> > So, I think this may have something to do with it.  However, I am
> > "unset"ting it after retrieval, not before inserting it. ????
> 
> But was it actually set on the value you inserted?
> 
> [FYI, the output from trace() quotes strings with the SvUTF8 flag
> on with double quotes, and uses single quotes if SvUTF8 is off.
> That's a quick way to see what's going on.]
> 
> > By the way, the same program moved over to a different machine where we
> use
> > PostgreSQL (DBD::Pg) (without the _utf8_off, of course)  works fine (as
> I
> > would expect).
> 
> I suspect DBD::Pg is doing something wrong that just happens to
> work for your view of how it ought to work. Of course, I may be wrong.
> 
> Tim.

RE: more DBD::Oracle utf8 weirdness, and kludge that should not have worked, but did

Reply via email to