On Fri, Nov 05, 2004 at 10:24:39AM -0800, Susan Cassidy wrote: > Hi, > Thanks Tim. > > I'm not sure what you mean by " the _client_ CHAR and NCHAR character sets". > How do I check? (Obviously, I did not install the Oracle stuff myself, and > we do not have our own DBA).
What are your NLS_LANG and NLS_NCHAR environment variables set to? > By "validates as utf8", I meant that it is valid utf8 encoding (a). > > By "did not match" I meant that " if $saved_data eq $retrieved_data " > returns false. Okay. I've no time to re-examine your original email (got to sleep-n-pack for the MySQL conference in Frankfurt). From what I said in reply and what you've here you, or someone else, ought to be able to work out what's going on. A hint: it's important that your client NLS_LANG and NLS_NCHAR environment variables are set correctly, and that any UTF8 values your're using have the UTF8 flag set. Please reread the Unicode section of the DBD::Oracle docs. Let me know if there's anything that's not clear enough. Tim. > Thanks, > Susan > > > -----Original Message----- > > From: Tim Bunce [mailto:[EMAIL PROTECTED] > > Sent: Friday, November 05, 2004 2:10 AM > > To: Susan Cassidy > > Cc: [EMAIL PROTECTED] > > Subject: Re: more DBD::Oracle utf8 weirdness, and kludge that should not > > have worked, but did > > > > On Thu, Nov 04, 2004 at 01:42:13PM -0800, Susan Cassidy wrote: > > > I finally got my large, complex cgi/Oracle application working with > > > DBD::Oracle 1.16, using database character set AL32UTF8, NLS_LANG=.UTF8, > > > etc. > > > > And what are the _client_ CHAR and NCHAR character sets? > > And is the field you're inserting into a CHAR or NCHAR? > > > > > The test program takes some English sentences, runs them through a > > > translator (which produces utf8 output, works fine - data validates as > > utf8 > > > on multiple systems, etc.). > > > > It's important to keep in mind that "validates as utf8" is ambiguous. > > > > It could mean *either or both* of: > > > > a) the sequence of is a valid utf8 encoding. > > b) the perl scalar value has the perl SvUTF8 flag turned on. > > > > Much confusion is caused by not keeping those two separate points > > in mind. It's important to be clear what you're thinking about, > > and precise when communicating it to others. > > > > > I then insert it into the database, and retrieve it. The retrieved > > > data did not match the translated data. > > > > I'm afraid that "The retrieved data did not match the translated > > data" is another ambiguous statement. > > > > If a sequence of bytes that does not have the SvUTF8 flag turned > > on is compared with the same sequence of bytes that does, they won't > > match (unless the string is all ASCII). > > > > Perl will encode the sequence of bytes that does not have the SvUTF8 > > flag turned on into UTF8 by treating each byte as a Latin1 character > > (by default). If the sequence of bytes was UTF8 encoded already > > (but not marked with the SvUTF8 flag) then treating each byte as a > > Latin1 character will produce garbage unless the string is all ASCII. > > > > So the two strings with the same sequence of bytes may not match! > > > > > I added some tests in the code to check on the translated value like: > > > if (Encode::is_utf8($textval)) { > > > print "<p> is utf8!\n"; > > > } else { > > > print "<p> is NOT utf8\n"; > > > } > > > This prints "is NOT utf8" (when I know that it really is utf8). > > > > Do you know which out of A and B above Encode::is_utf8 actually tests for? > > Do you know which out of A and B you mean by "it really is utf8"? > > > > > If I do the same thing to the retrieved data, it prints that the data IS > > > utf8. > > > > The returned data will be both valid utf8 and have the SvUTF8 flag on > > if your relevant (CHAR/NCHAR) client character set is UTF8 or AL32UTF8. > > > > But that doesn't mean it contains the same string you passed in! :) > > So I trust you're also checking if $inserted_value eq $fetched_value. > > > > > However, if I turn off the utf8 flag explicitly after retrieving the > > data, > > > before comparing the translated data with the retrieved data, it works: > > > > Probably because you're now comparing byte strings as byte strings. > > > > > Of course, where I print out the status of utf8 below this, it now says > > it > > > is NOT utf8. > > > > Of course. > > > > > I have re-read the Encode perldoc stuff several times. It seems to be > > > working (on my system) backwards, sort of? > > > > > > I the DBD::Oracle 1.16 docs, Tim says: > > > If the string passed to bind_param() is considered by perl to be a > > > valid utf8 string ( utf8::is_utf8($string) returns true ), then > > > DBD::Oracle will implicitly set csform SQLCS_NCHAR and csid > > AL32UTF8 > > > for you on insert. > > > So, I think this may have something to do with it. However, I am > > > "unset"ting it after retrieval, not before inserting it. ???? > > > > But was it actually set on the value you inserted? > > > > [FYI, the output from trace() quotes strings with the SvUTF8 flag > > on with double quotes, and uses single quotes if SvUTF8 is off. > > That's a quick way to see what's going on.] > > > > > By the way, the same program moved over to a different machine where we > > use > > > PostgreSQL (DBD::Pg) (without the _utf8_off, of course) works fine (as > > I > > > would expect). > > > > I suspect DBD::Pg is doing something wrong that just happens to > > work for your view of how it ought to work. Of course, I may be wrong. > > > > Tim. >