[EMAIL PROTECTED] said: > As far as I know, the data base engine stores text using UTF-8. > ...
It would be worthwhile to use some other mode of access to confirm this. It's possible that non-utf8 text data are being stored into tables in some way that you don't expect or don't directly control. If some person or process is inserting non-utf8 data into the database, it's very unlikely that the database engine itself is doing anything to alter the data (e.g. to convert it to utf8) -- database engines don't do that. To say that it "stores text using UTF-8" would simply mean that it handles character data types in a manner that is "8-bit-clean" -- it won't screw-up or alter characters that happen to have the high-bit set, and when queried, will always return exactly what was inserted. > Now it seems as if the texts I get from DBI were encoded > with ISO-8859-1. Could it be possible that DBI is converting > the UTF-8 obtained from the data base to ISO-8859-1? > Possibly it considers ISO-8859-1 to be the "default client > charset"? ... I'm not personally familiar with the DBI source code, but I believe any sort of conversion or alteration of data content by DBI should be quite impossible (unless there is a bug in the driver for a given RDB engine). Data going to or from a database is supposed to pass through DBI without modification of any sort. > How can I get the utf-8 text stored in the data base? If you have a utf8-encoded string and put this into a table via an insert or update operation, that specific byte sequence should be retrievable from the table later on, via a normal query. If you are encountering a situation where you are specifically inserting a utf8 character string, and are then getting back something different when you query for that string, you should contact the author of the dbi:ADO driver module. Again, it will be helpful to use other methods of access to the database so that you can get a better idea of where the data corruption is happening. Dave Graff