Recognizing utf8 encoded data in latin1 fields/tables

Jigal van Hemert Sun, 08 May 2011 11:30:21 -0700

Hi,

The TYPO3 CMS I'm working on uses UTF-8 database fields for some timenow by default. There are sometimes old installation, which have beenupdated without properly converting the database. The result: UTF-8encoded data in (most often) latin1 tables/fields.

I have a conversion script which analyses the table definitions and usesthe "trick" of two alter table operations (first to the binaryequivalent of the column type and then to the normal type with the utf8charset) to convert the data to the correct character set.

It would be nice to be able to detect this situation using queries only(faster than transferring the data into the PHP script and analysing itthere).


I have been fiddling a bit with a few columns:
test: latin1 (latin1-swedish-ci) contains UTF-8 encoded data
test1: latin1 (latin1-swedish-ci) contains latin1 encoded data

test: LandrÃ«Ã©Ã¼Ã¶Ã¯ÃŸ
CONVERT(BINARY `test` USING utf8): Landrëéüöïß
CONVERT(`test` USING utf8) : LandrÃ«Ã©Ã¼Ã¶Ã¯ÃŸ

test1: Landrëéüöïß
CONVERT(BINARY `test1` USING utf8) : Landr
CONVERT(`test1` USING utf8) : Landrëéüöïß

I'm now looking for an expression which can differentiate between thetwo situations if possible without having to look for all possiblecombinations of the encoded data.


--
Kind regards / met vriendelijke groet,

Jigal van Hemert.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Recognizing utf8 encoded data in latin1 fields/tables

Reply via email to