Re: DBI and UTF-8

David Graff Sat, 06 Dec 2003 08:34:30 -0800

[EMAIL PROTECTED] said:
>   As far as I know, the data base engine stores text using UTF-8.
>   ...


It would be worthwhile to use some other mode of access to confirm this.
It's possible that non-utf8 text data are being stored into tables in
some way that you don't expect or don't directly control.  If some
person or process is inserting non-utf8 data into the database, it's
very unlikely that the database engine itself is doing anything to alter
the data (e.g. to convert it to utf8) -- database engines don't do that.

To say that it "stores text using UTF-8" would simply mean that it 
handles character data types in a manner that is "8-bit-clean" -- it 
won't screw-up or alter characters that happen to have the high-bit 
set, and when queried, will always return exactly what was inserted.

>   Now it seems as if the texts I get from DBI were encoded
>   with ISO-8859-1. Could it be possible that DBI is converting
>   the UTF-8 obtained from the data base to ISO-8859-1?
>   Possibly it considers ISO-8859-1 to be the "default client
>   charset"?  ...

I'm not personally familiar with the DBI source code, but I believe any
sort of conversion or alteration of data content by DBI should be quite
impossible (unless there is a bug in the driver for a given RDB engine).
Data going to or from a database is supposed to pass through DBI without
modification of any sort.

>   How can I get the utf-8 text stored in the data base?

If you have a utf8-encoded string and put this into a table via an 
insert or update operation, that specific byte sequence should be 
retrievable from the table later on, via a normal query.

If you are encountering a situation where you are specifically inserting
a utf8 character string, and are then getting back something different
when you query for that string, you should contact the author of the
dbi:ADO driver module.  Again, it will be helpful to use other methods
of access to the database so that you can get a better idea of where the
data corruption is happening.

        Dave Graff

Re: DBI and UTF-8

Reply via email to