Bug#438970: [debian-mysql] Bug#438970: Bug#438970: UTF8 default charcterset for mysql-server package
On 2007-12-05 Cristóbal Palmer wrote: > On Tue, Dec 04, 2007 at 11:18:38PM +0100, Norbert Tretkowski wrote: > > I've read the whole bugreport a few times now, and I think I have to > > agree with Sean, we should switch the default charset for new databases > > to utf8, but shouldn't touch existing ones. > > Agreed. I think the easiest way to accomplish this (from the > sysadmin's perspective) would be to have a separate package, maybe > mysql-server-utf8? But again, I'm not an experienced packager. I'm > sure there are great debianish minds that will choose the best path. Separate packages makes a lot of work on our (now Norberts *g*) side! Putting default_character_set and default_collation in /etc/mysql/my.cnf should work and has the advantage that my.cnf is treated as config file which means that our new version is safely installed as my.cnf.dpkg-dist if the admin wants to preserve his old configuration. > (1) A previous comment seemed to indicate that changes to my.cnf would > cover everything. If it's possible to get mysql to do utf8 across the > board by default (server, db, client, conn) by only adjusting the > my.cnf under debian, would someone please attach such a my.cnf? I was > unsuccessful in my attempts to utf8-ify that way. Remember to put "default-charset" into the [client] part of my.cnf, too. (If that also does not work please describe exacly what goes wrong) > (2) What are the consequences of changing the default collation? This is an "other peoples problem". We ship a default config for new installations and a big fat warning in the NEWS file. If the admin decides to overwrite his my.cnf with ours he is responsible, if he starts from scratch with our UTF-8 config there is no problem :) > It would be nice if everybody understood encodings thoroughly and > played nice, but doing a little poking turns up tons of examples of > webapps behaving badly, and for a variety of reasons. But I agree to that and Debian would be glad if people with knowledge test our "ideas" :) bye, -christian- signature.asc Description: PGP signature
Bug#438970: [debian-mysql] Bug#438970: UTF8 default charcterset for mysql-server package
On Tue, Dec 04, 2007 at 11:18:38PM +0100, Norbert Tretkowski wrote: > I've read the whole bugreport a few times now, and I think I have to > agree with Sean, we should switch the default charset for new databases > to utf8, but shouldn't touch existing ones. Agreed. I think the easiest way to accomplish this (from the sysadmin's perspective) would be to have a separate package, maybe mysql-server-utf8? But again, I'm not an experienced packager. I'm sure there are great debianish minds that will choose the best path. As for why they didn't go with utf8 as the default in the past, I recommend this article: http://dev.mysql.com/tech-resources/articles/4.1/unicode.html Actually, I *really* recommend that article. It made the difference for me in terms of understanding what otherwise seemed like a nonsensical charset forest with mysql. > Having some testers with MySQL and utf8 experience would be great, > thanks for your offer. Expect a package in experimental if we really > decide to switch the default charset. I look forward to it! I'm confident that together we can get something packaged that will make a lot of people's lives significantly easier in the long run. I did want to chime in with a few more things: (1) A previous comment seemed to indicate that changes to my.cnf would cover everything. If it's possible to get mysql to do utf8 across the board by default (server, db, client, conn) by only adjusting the my.cnf under debian, would someone please attach such a my.cnf? I was unsuccessful in my attempts to utf8-ify that way. (2) What are the consequences of changing the default collation? The default collation for utf8 is utf8_general_ci (try 'show character set;'), but the default for latin1 is latin1_swedish_ci (‽‽). I'll tell you now that the consequences could be painful ☮✈⚔⚠☫⚛☠±♫♥ for users of some webapps, including (in the past) drupal: http://drupal.org/node/66333 In case that wasn't clear, I mean that it can break things. Note that the drupal example above was specifically a collation issue (http://drupal.org/node/66333#comment-412577), and I feel sorry for the reporter, who got a "won't fix" and "When it does occur, it is relatively easy to fix by hand," which is--with all due respect to the fine drupal people--bogus, imho. The problem is that you can't always anticipate when/where/how charset conversion or collation problems will be happening. Here's a horrible example of how NOT to do latin1 -> utf8: http://lists.wikimedia.org/pipermail/mediawiki-l/2004-November/002245.html It would be nice if everybody understood encodings thoroughly and played nice, but doing a little poking turns up tons of examples of webapps behaving badly, and for a variety of reasons. Or maybe it's a clash of expectations/preferences? My personal non-database favorite encoding hobby horse is mailman lists and their archives. Perhaps it's irrational of me to think that I shouldn't have to change browser settings to view things correctly. Try visiting: http://lists.ibiblio.org/pipermail/cc-jp/ for example. I promise it's not broken.* Mostly. :) The more we consolidate on utf8, the better things get, but along the way there will be painful moments. That's life. UTF8 by default is a change that should happen, but carefully, and there will likely need to be legacy support for the old defaults for some time. Cheers, -- Cristóbal Palmer ibiblio.org systems administrator * Hint: View -> Character Encodings -> More Encodings -> East Asian -> EUC-JP Bonus points if you can tell me why some pages, eg. http://lists.ibiblio.org/pipermail/cc-jp/2004-March/000128.html look broken. Mailing list archives are fun, see?
Bug#438970: [debian-mysql] Bug#438970: UTF8 default charcterset for mysql-server package
Hi, I've read the whole bugreport a few times now, and I think I have to agree with Sean, we should switch the default charset for new databases to utf8, but shouldn't touch existing ones. Maybe I just don't see the real reason why MySQL decided to not use utf8 as default charset, I'll ask on the MySQL packagers list before finally switching the default charset. Am Dienstag, den 04.12.2007, 15:37 -0500 schrieb Cristóbal Palmer: > If I can be of assistance in testing, I'd be happy to spin up another > VM to do so. It's definitely worth my time since it will significantly > reduce headaches moving forward. Having some testers with MySQL and utf8 experience would be great, thanks for your offer. Expect a package in experimental if we really decide to switch the default charset. Norbert
Bug#438970: UTF8 default charcterset for mysql-server package
Hi, Cristóbal with ibiblio.org here. We recently handed off DocSouth (http://docsouth.unc.edu/) from our servers to UNC libraries'. Because of such changes and growth, we've been repeatedly exposed to encoding problems such as the one the original reporter describes. UNC Library systems has managed to make the transition to UTF8 *enforced* across the board for their mysql, which for them (on RHEL) has involved maintaining their own package. So UTF8 in mysql is something we're rather familiar with, and something we'd LOVE to see by default from debian. If I can be of assistance in testing, I'd be happy to spin up another VM to do so. It's definitely worth my time since it will significantly reduce headaches moving forward. Last night I built mysql from the debian source package under etch. That and a change to the /etc/init.d/mysql script got me this status; output: mysql Ver 14.12 Distrib 5.0.32, for pc-linux-gnu (i686) using EditLine wrapper Connection id: 10 Current database: Current user: [EMAIL PROTECTED] SSL:Not in use Current pager: stdout Using outfile: '' Using delimiter:; Server version: 5.0.32-Debian_7etch3-log Debian etch distribution Protocol version: 10 Connection: Localhost via UNIX socket Server characterset:utf8 Db characterset:utf8 Client characterset:utf8 Conn. characterset:utf8 UNIX socket:/var/run/mysqld/mysqld.sock Uptime: 7 min 39 sec Threads: 1 Questions: 164 Slow queries: 0 Opens: 154 Flush tables: 1 Open t ables: 36 Queries per second avg: 0.357 So again, please let me know how I can help with this transition. I'm not an experienced packager, but I'm happy to test or dive in and learn whatever is needed for me to be up to speed with helping on this in any capacity that would be appreciated. Cheers, -- Cristóbal Palmer ibiblio.org systems administrator
Bug#438970: [debian-mysql] Bug#438970: UTF8 default charcterset for mysql-server package
reopen 438970 thanks hey guys, guess i missed the bug and initial discussion, i must have been automatically unsubscribed from pkg-mysql-maint when my mail server went down a few weeks ago. anyway, first things first, i'm reopening the bug since even if it will never be implemented it would be more proper to tag it wontfix and leave it open as opposed to closing it. however, i would argue that we *should* switch default charsets, and as far as where we are in the release cycle i think approximately now is a really good time to do so. after all, wasn't pervasive utf8 support a release goal of debian >= etch? of course, the devil is in the details, since we would need to make sure that any such switch didn't cause problems for existing users. assuming that it doesn't introduce any problems, i'd propose something like the following: - make whatever modifications are necessary to the initial bootstrapping code for first-time installs so that default databases/tables are utf8 - have my.cnf default to utf8 (this shouldn't affect existing tables/databases right?) - not attempt to convert anything from existing installs - place lots of scary warnings in the changelog and NEWS.Debian - make sure the release notes for lenny address this sean signature.asc Description: This is a digitally signed message part.
Bug#438970: UTF8 default charcterset for mysql-server package
Package: mysql-server Version: 5.0.32-7etch1 The default character set for the mysql-server package is currently latin1. See the server status notice after installing the package: *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* mysql> status; -- mysql Ver 14.12 Distrib 5.0.38, for pc-linux-gnu (i486) using readline 5.2 Connection id: 1 Current database: Current user: [EMAIL PROTECTED] SSL:Not in use Current pager: stdout Using outfile: '' Using delimiter:; Server version: 5.0.32-Debian_7etch1-log Debian etch distribution Protocol version: 10 Connection: Localhost via UNIX socket Server characterset:latin1 Db characterset:latin1 Client characterset:latin1 Conn. characterset:latin1 UNIX socket:/var/run/mysqld/mysqld.sock Uptime: 32 sec *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* This behaviour can cause problems in any number of scenarios. I personally noticed it when I was changing forum software on an Ubuntu forum. Since the two forum applications had different ways of handling character sets I had to convert the forum database from latin1 to utf8 for the forum application to correctly pick up the characters. Of course, I was quite frustrated with the inability of the forum packages to deal with the character sets correctly but I also realised that the problem may have been non-existent if the database originally had been utf8. The suggested solution is to always compile the mysql package with utf8 as default character set. (./configure --with-charset=utf8) I am using Ubuntu 7.04, 2.6.20-16-generic kernel and libc6 2.5-0ubuntu14. The reason I am reporting this with Debian bugs and not with Ubuntu bugs / Launchpad is because I noticed it was - as I believe - as incorrect in the Debian package as it is in the Ubuntu package. I also have a suspicion that the Ubuntu package is supplied directly from the Debian package without modification - I may, of course, be wrong about that. Thanks and Best Regards, Johan Ramm-Ericson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]