Hi,

The TYPO3 CMS I'm working on uses UTF-8 database fields for some time now by default. There are sometimes old installation, which have been updated without properly converting the database. The result: UTF-8 encoded data in (most often) latin1 tables/fields.

I have a conversion script which analyses the table definitions and uses the "trick" of two alter table operations (first to the binary equivalent of the column type and then to the normal type with the utf8 charset) to convert the data to the correct character set.

It would be nice to be able to detect this situation using queries only (faster than transferring the data into the PHP script and analysing it there).

I have been fiddling a bit with a few columns:
test: latin1 (latin1-swedish-ci) contains UTF-8 encoded data
test1: latin1 (latin1-swedish-ci) contains latin1 encoded data

test: Landrëéüöïß
CONVERT(BINARY `test` USING utf8): Landrëéüöïß
CONVERT(`test` USING utf8) : Landrëéüöïß

test1: Landrëéüöïß
CONVERT(BINARY `test1` USING utf8) : Landr
CONVERT(`test1` USING utf8) : Landrëéüöïß

I'm now looking for an expression which can differentiate between the two situations if possible without having to look for all possible combinations of the encoded data.

--
Kind regards / met vriendelijke groet,

Jigal van Hemert.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to