Hi, Thanks Tim. I'm not sure what you mean by " the _client_ CHAR and NCHAR character sets". How do I check? (Obviously, I did not install the Oracle stuff myself, and we do not have our own DBA).
By "validates as utf8", I meant that it is valid utf8 encoding (a). By "did not match" I meant that " if $saved_data eq $retrieved_data " returns false. Thanks, Susan > -----Original Message----- > From: Tim Bunce [mailto:[EMAIL PROTECTED] > Sent: Friday, November 05, 2004 2:10 AM > To: Susan Cassidy > Cc: [EMAIL PROTECTED] > Subject: Re: more DBD::Oracle utf8 weirdness, and kludge that should not > have worked, but did > > On Thu, Nov 04, 2004 at 01:42:13PM -0800, Susan Cassidy wrote: > > I finally got my large, complex cgi/Oracle application working with > > DBD::Oracle 1.16, using database character set AL32UTF8, NLS_LANG=.UTF8, > > etc. > > And what are the _client_ CHAR and NCHAR character sets? > And is the field you're inserting into a CHAR or NCHAR? > > > The test program takes some English sentences, runs them through a > > translator (which produces utf8 output, works fine - data validates as > utf8 > > on multiple systems, etc.). > > It's important to keep in mind that "validates as utf8" is ambiguous. > > It could mean *either or both* of: > > a) the sequence of is a valid utf8 encoding. > b) the perl scalar value has the perl SvUTF8 flag turned on. > > Much confusion is caused by not keeping those two separate points > in mind. It's important to be clear what you're thinking about, > and precise when communicating it to others. > > > I then insert it into the database, and retrieve it. The retrieved > > data did not match the translated data. > > I'm afraid that "The retrieved data did not match the translated > data" is another ambiguous statement. > > If a sequence of bytes that does not have the SvUTF8 flag turned > on is compared with the same sequence of bytes that does, they won't > match (unless the string is all ASCII). > > Perl will encode the sequence of bytes that does not have the SvUTF8 > flag turned on into UTF8 by treating each byte as a Latin1 character > (by default). If the sequence of bytes was UTF8 encoded already > (but not marked with the SvUTF8 flag) then treating each byte as a > Latin1 character will produce garbage unless the string is all ASCII. > > So the two strings with the same sequence of bytes may not match! > > > I added some tests in the code to check on the translated value like: > > if (Encode::is_utf8($textval)) { > > print "<p> is utf8!\n"; > > } else { > > print "<p> is NOT utf8\n"; > > } > > This prints "is NOT utf8" (when I know that it really is utf8). > > Do you know which out of A and B above Encode::is_utf8 actually tests for? > Do you know which out of A and B you mean by "it really is utf8"? > > > If I do the same thing to the retrieved data, it prints that the data IS > > utf8. > > The returned data will be both valid utf8 and have the SvUTF8 flag on > if your relevant (CHAR/NCHAR) client character set is UTF8 or AL32UTF8. > > But that doesn't mean it contains the same string you passed in! :) > So I trust you're also checking if $inserted_value eq $fetched_value. > > > However, if I turn off the utf8 flag explicitly after retrieving the > data, > > before comparing the translated data with the retrieved data, it works: > > Probably because you're now comparing byte strings as byte strings. > > > Of course, where I print out the status of utf8 below this, it now says > it > > is NOT utf8. > > Of course. > > > I have re-read the Encode perldoc stuff several times. It seems to be > > working (on my system) backwards, sort of? > > > > I the DBD::Oracle 1.16 docs, Tim says: > > If the string passed to bind_param() is considered by perl to be a > > valid utf8 string ( utf8::is_utf8($string) returns true ), then > > DBD::Oracle will implicitly set csform SQLCS_NCHAR and csid > AL32UTF8 > > for you on insert. > > So, I think this may have something to do with it. However, I am > > "unset"ting it after retrieval, not before inserting it. ???? > > But was it actually set on the value you inserted? > > [FYI, the output from trace() quotes strings with the SvUTF8 flag > on with double quotes, and uses single quotes if SvUTF8 is off. > That's a quick way to see what's going on.] > > > By the way, the same program moved over to a different machine where we > use > > PostgreSQL (DBD::Pg) (without the _utf8_off, of course) works fine (as > I > > would expect). > > I suspect DBD::Pg is doing something wrong that just happens to > work for your view of how it ought to work. Of course, I may be wrong. > > Tim.