Thanks for that link! That's another subtle issue I had not noted. There are so many combinations, that it is hard to say "do this": * Incoming bytes are latin1 / utf8 / Microsquish control characters. * You do/don't have SET NAMES (or equivalent) * The database/table/column is declared latin1/utf8/other. * The problem is on ingestion / on retrieval.
The thing mentioned involved 2 steps: ALTER TABLE ... MODIFY COLUMN BINARY (or BLOB); -- to forget any charset knowledge ALTER TABLE ... MODIFY COLUMN CHARACTER SET ...; -- coming from BINARY, this does not check the encoding. (sorry, don't have the link handy) > -----Original Message----- > From: h...@tbbs.net [mailto:h...@tbbs.net] > Sent: Thursday, September 27, 2012 2:24 PM > To: Mark Phillips > Cc: Mysql List > Subject: Re: Need Help Converting Character Sets > > >>>> 2012/09/24 16:28 -0700, Mark Phillips >>>> > I have a table, Articles, of news articles (in English) with three text > columns for the intro, body, and caption. The data came from a web > page, and the content was cut and pasted from other sources. I am > finding that there are some non utf-8 characters in these three text > columns. I would like to (1) convert these text fields to be strict > utf-8 and then (2) fix the input page to keep all new submissions utf- > 8. > > 91) For the first step, fixing the current database, I tried: > > update Articles set body = CONVERT(body USING ASCII); > > However, when I checked one of the articles I found an apostrophe had > been converted into a question mark. (FWIW, the apostrophe was one of > those offending non utf-8 characters): > > Before conversion: "I stepped into the observatory?s control room ..." > > After conversion: "I stepped into the observatory?s control room..." > > Is there a better way to accomplish my first goal, without reading each > article and manually making the changes? > <<<<<<<< > I do not remember where on the MySQL website this is, but there was an > article about converting from character sets in version 4 to those in > version 5, when UTF-8 first was supported. It sounds to me that maybe > the tricks shown there would be useful to you, since, in effect, > through MySQL MySQL was fooled into accepting for UTF-8 that which was > not. Conversion to binary string was mentioned. > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql