I've got some years-old MySQL databases mostly in 4.0, but one server
running 3.23 that are all using the default encoding.

I want to update all their data to 4.1 with UTF-8 encoding.

Anyone done this kind of dump-and-update?    Any advice to share or
good URLs you've seen with others' advice about this?



Hi! We have been going through this process since January... the learning curve was steep and the resources hard to find. At the end of the day it's quite simple unless there are weird things already in your database... here's the process we use:

On the original server using 4.0.n server and tools to match we run this:

mysqldump --tab=/var/tmp/database database

Then we move the directory /var/tmp/database to /var/tmp on the new server with 4.1 running... note this has the 4.1.n server AND the 4.1.n tools (such as mysql, mysqldump, mysqlimport and so on). Also note we have this line (amongst others) in our 4.1 my.cnf file:

[mysqld]
default-character-set=utf8

The upshot of this is that by default all new databases and tables will automatically create themselves with utf8 as the default character set unless told otherwise. For us this was important because we have MANY databases with MANY tables with MANY columns, and going through and setting the character set for each and every database/table/column was prohibitively expensive. HOWEVER... give some thought to this, the reality is for our setup there are maybe two or three columns in two or three tables in each database that really need to be UTF8 - most of the data doesn't need to be encoded this way, and there are some overheads to having everything encoded in utf8. But in the interests of time and quick conversion, we did it this way, we are reassessing it and may change things before we convert the rest of the databases. If you do decide to do this you may want to setup your mysql database using latin1 before setting the default for everything on the server to utf8... things like username/ hostname/password with 16 character varchar column type when converted to utf8 allows 16 bytes, and not 16 characters, and since utf8 allows multibyte characters you may only get 5 characters in your usernames etc... so there are little gotchas to setting the default character set for the whole server to utf8, and if you do, configure the mysql database separately on it's own.

OK, having set the default character set for everything on the server to utf8 we go ahead and import our data into the server using this sequence of commands (we use a shell script, so that's what you get here). Call the shell script by giving it the database name as a flag (eg ./import database) - watch for differences in line breaks caused by email clients here, there are three lines of commands after setting DB=$1.

#!/bin/sh
#
# LiveWorld's MySQL Import Script
# Use for converting 4.0 databases to 4.1 UTF8 databases
# Suitable for LiveWorld Servers only, use at your own risk
#

DB=$1

mysql --socket=/tmp/mysql.sock -e "CREATE DATABASE $DB;"
cat /var/tmp/$DB/*sql | mysql --socket=/tmp/mysql.sock $DB
mysqlimport --default-character-set=utf8 --socket=/tmp/mysql.sock $DB /var/tmp/$DB/*txt


If you choose (probably wisely) not to set the default character set for the server to utf8 you can achieve the same result by making the first execution line of the above script to look like this:

mysql --socket=/tmp/mysql.sock -e "CREATE DATABASE $DB default character set utf8;"

Which will still have the affect of making your newly imported database use utf8 everywhere....

This process has worked for us taking our latin1 4.0 databases and turning them into utf8 4.1 databases. UTF8 data we had already put in our 4.0 database despite it's latin1 encoding was correctly exported out of 4.0 and correctly converted on it's way in to 4.1, we don't loose anything along the way. Just again though I need to restate.. things like:

username varchar(75) binary NOT NULL default ''

take on a new meaning under utf8, it's no longer 75 characters, but 75 bytes, and utf8 encoded data takes more bytes. As well as potential data issues where you expect something to be 8 characters and it's really 24 bytes so having a varchar(8) may break new data inserts. There are also disk space issues that come out of this, and of course if you triple your disk usage there may also be new performance issues. Our recommendation is to do the import as above (this way you are sure to get your utf8 data in to the database the right way) and then go through (by script potentially) and convert the tables and columns that don't really need to be utf8 back to latin1... which is what we are looking at doing.

Hope this has been a little helpful :-)

Best Regards, Bruce

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to