Bug#438970: [debian-mysql] Bug#438970: Bug#438970: UTF8 default charcterset for mysql-server package

2007-12-05 Thread Christian Hammers


On 2007-12-05 Cristóbal Palmer wrote:
> On Tue, Dec 04, 2007 at 11:18:38PM +0100, Norbert Tretkowski wrote:
> > I've read the whole bugreport a few times now, and I think I have to
> > agree with Sean, we should switch the default charset for new databases
> > to utf8, but shouldn't touch existing ones.
> 
> Agreed. I think the easiest way to accomplish this (from the
> sysadmin's perspective) would be to have a separate package, maybe
> mysql-server-utf8? But again, I'm not an experienced packager. I'm
> sure there are great debianish minds that will choose the best path.

Separate packages makes a lot of work on our (now Norberts *g*) side!

Putting default_character_set and default_collation
in /etc/mysql/my.cnf should work and has the advantage that my.cnf is
treated as config file which means that our new version is safely installed
as my.cnf.dpkg-dist if the admin wants to preserve his old configuration.

> (1) A previous comment seemed to indicate that changes to my.cnf would
> cover everything. If it's possible to get mysql to do utf8 across the
> board by default (server, db, client, conn) by only adjusting the
> my.cnf under debian, would someone please attach such a my.cnf? I was
> unsuccessful in my attempts to utf8-ify that way.

Remember to put "default-charset" into the [client] part of my.cnf, too.
(If that also does not work please describe exacly what goes wrong)
 
> (2) What are the consequences of changing the default collation?

This is an "other peoples problem". We ship a default config for new
installations and a big fat warning in the NEWS file. If the admin
decides to  overwrite his my.cnf with ours he is responsible, if he
starts from scratch with our UTF-8 config there is no problem :)

> It would be nice if everybody understood encodings thoroughly and
> played nice, but doing a little poking turns up tons of examples of
> webapps behaving badly, and for a variety of reasons.
But I agree to that and Debian would be glad if people with knowledge
test our "ideas" :)

bye,

-christian-


signature.asc
Description: PGP signature


Bug#438970: [debian-mysql] Bug#438970: UTF8 default charcterset for mysql-server package

2007-12-05 Thread Cristóbal Palmer
On Tue, Dec 04, 2007 at 11:18:38PM +0100, Norbert Tretkowski wrote:
> I've read the whole bugreport a few times now, and I think I have to
> agree with Sean, we should switch the default charset for new databases
> to utf8, but shouldn't touch existing ones.

Agreed. I think the easiest way to accomplish this (from the
sysadmin's perspective) would be to have a separate package, maybe
mysql-server-utf8? But again, I'm not an experienced packager. I'm
sure there are great debianish minds that will choose the best path.

As for why they didn't go with utf8 as the default in the past, I
recommend this article:

http://dev.mysql.com/tech-resources/articles/4.1/unicode.html

Actually, I *really* recommend that article. It made the difference
for me in terms of understanding what otherwise seemed like a
nonsensical charset forest with mysql.

> Having some testers with MySQL and utf8 experience would be great,
> thanks for your offer. Expect a package in experimental if we really
> decide to switch the default charset.

I look forward to it! I'm confident that together we can get something
packaged that will make a lot of people's lives significantly easier
in the long run.

I did want to chime in with a few more things:

(1) A previous comment seemed to indicate that changes to my.cnf would
cover everything. If it's possible to get mysql to do utf8 across the
board by default (server, db, client, conn) by only adjusting the
my.cnf under debian, would someone please attach such a my.cnf? I was
unsuccessful in my attempts to utf8-ify that way.

(2) What are the consequences of changing the default collation? The
default collation for utf8 is utf8_general_ci (try 'show character
set;'), but the default for latin1 is latin1_swedish_ci (‽‽). I'll
tell you now that the consequences could be painful ☮✈⚔⚠☫⚛☠±♫♥ for
users of some webapps, including (in the past) drupal:

http://drupal.org/node/66333

In case that wasn't clear, I mean that it can break things. Note that
the drupal example above was specifically a collation issue
(http://drupal.org/node/66333#comment-412577), and I feel sorry for
the reporter, who got a "won't fix" and "When it does occur, it is
relatively easy to fix by hand," which is--with all due respect to the
fine drupal people--bogus, imho.

The problem is that you can't always anticipate when/where/how charset
conversion or collation problems will be happening. Here's a horrible
example of how NOT to do latin1 -> utf8:

http://lists.wikimedia.org/pipermail/mediawiki-l/2004-November/002245.html

It would be nice if everybody understood encodings thoroughly and
played nice, but doing a little poking turns up tons of examples of
webapps behaving badly, and for a variety of reasons. Or maybe it's a
clash of expectations/preferences? My personal non-database favorite
encoding hobby horse is mailman lists and their archives. Perhaps it's
irrational of me to think that I shouldn't have to change browser
settings to view things correctly. Try visiting:

http://lists.ibiblio.org/pipermail/cc-jp/

for example. I promise it's not broken.* Mostly. :)

The more we consolidate on utf8, the better things get, but along the
way there will be painful moments. That's life.

UTF8 by default is a change that should happen, but carefully, and
there will likely need to be legacy support for the old defaults for
some time.

Cheers,
-- 
Cristóbal Palmer
ibiblio.org systems administrator

* Hint: View -> Character Encodings -> More Encodings -> East Asian ->
EUC-JP

Bonus points if you can tell me why some pages, eg.

http://lists.ibiblio.org/pipermail/cc-jp/2004-March/000128.html

look broken. Mailing list archives are fun, see?




Bug#438970: [debian-mysql] Bug#438970: UTF8 default charcterset for mysql-server package

2007-12-04 Thread Norbert Tretkowski
Hi,

I've read the whole bugreport a few times now, and I think I have to
agree with Sean, we should switch the default charset for new databases
to utf8, but shouldn't touch existing ones.

Maybe I just don't see the real reason why MySQL decided to not use utf8
as default charset, I'll ask on the MySQL packagers list before finally
switching the default charset.

Am Dienstag, den 04.12.2007, 15:37 -0500 schrieb Cristóbal Palmer:
> If I can be of assistance in testing, I'd be happy to spin up another
> VM to do so. It's definitely worth my time since it will significantly
> reduce headaches moving forward.

Having some testers with MySQL and utf8 experience would be great,
thanks for your offer. Expect a package in experimental if we really
decide to switch the default charset.

Norbert





Bug#438970: UTF8 default charcterset for mysql-server package

2007-12-04 Thread Cristóbal Palmer
Hi,

Cristóbal with ibiblio.org here. We recently handed off DocSouth
(http://docsouth.unc.edu/) from our servers to UNC libraries'. Because
of such changes and growth, we've been repeatedly exposed to encoding
problems such as the one the original reporter describes. UNC Library
systems has managed to make the transition to UTF8 *enforced* across
the board for their mysql, which for them (on RHEL) has involved
maintaining their own package. So UTF8 in mysql is something we're
rather familiar with, and something we'd LOVE to see by default from
debian.

If I can be of assistance in testing, I'd be happy to spin up another
VM to do so. It's definitely worth my time since it will significantly
reduce headaches moving forward.

Last night I built mysql from the debian source package under
etch. That and a change to the /etc/init.d/mysql script got me this
status; output:

mysql  Ver 14.12 Distrib 5.0.32, for pc-linux-gnu (i686) using  EditLine
wrapper

Connection id:  10
Current database:
Current user:   [EMAIL PROTECTED]
SSL:Not in use
Current pager:  stdout
Using outfile:  ''
Using delimiter:;
Server version: 5.0.32-Debian_7etch3-log Debian etch distribution
Protocol version:   10
Connection: Localhost via UNIX socket
Server characterset:utf8
Db characterset:utf8
Client characterset:utf8
Conn.  characterset:utf8
UNIX socket:/var/run/mysqld/mysqld.sock
Uptime: 7 min 39 sec

Threads: 1  Questions: 164  Slow queries: 0  Opens: 154  Flush tables: 1
 Open t ables: 36  Queries per second avg: 0.357

So again, please let me know how I can help with this transition. I'm
not an experienced packager, but I'm happy to test or dive in and
learn whatever is needed for me to be up to speed with helping on
this in any capacity that would be appreciated.

Cheers,
-- 
Cristóbal Palmer
ibiblio.org systems administrator




Bug#438970: [debian-mysql] Bug#438970: UTF8 default charcterset for mysql-server package

2007-08-21 Thread sean finney
reopen 438970
thanks

hey guys,

guess i missed the bug and initial discussion, i must have been automatically 
unsubscribed from pkg-mysql-maint when my mail server went down a few weeks 
ago.

anyway, first things first, i'm reopening the bug since even if it will never 
be implemented it would be more proper to tag it wontfix and leave it open as 
opposed to closing it.

however, i would argue that we *should* switch default charsets, and as far as 
where we are in the release cycle i think approximately now is a really good 
time to do so.  after all, wasn't pervasive utf8 support a release goal of 
debian >= etch?

of course, the devil is in the details, since we would need to make sure that 
any such switch didn't cause problems for existing users.   assuming that it 
doesn't introduce any problems, i'd propose something like the following:

- make whatever modifications are necessary to the initial bootstrapping code 
for first-time installs so that default databases/tables are utf8
- have my.cnf default to utf8 (this shouldn't affect existing tables/databases 
right?)
- not attempt to convert anything from existing installs
- place lots of scary warnings in the changelog and NEWS.Debian
- make sure the release notes for lenny address this


sean


signature.asc
Description: This is a digitally signed message part.


Bug#438970: UTF8 default charcterset for mysql-server package

2007-08-20 Thread Johan Ramm-Ericson
Package: mysql-server
Version: 5.0.32-7etch1

The default character set for the mysql-server package is currently
latin1. See the server status notice after installing the package:

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
mysql> status;
--
mysql  Ver 14.12 Distrib 5.0.38, for pc-linux-gnu (i486) using readline 5.2

Connection id:  1
Current database:
Current user:   [EMAIL PROTECTED]
SSL:Not in use
Current pager:  stdout
Using outfile:  ''
Using delimiter:;
Server version: 5.0.32-Debian_7etch1-log Debian etch distribution
Protocol version:   10
Connection: Localhost via UNIX socket
Server characterset:latin1
Db characterset:latin1
Client characterset:latin1
Conn.  characterset:latin1
UNIX socket:/var/run/mysqld/mysqld.sock
Uptime: 32 sec

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

This behaviour can cause problems in any number of scenarios. I personally
noticed it when I was changing forum software on an Ubuntu forum. Since
the two forum applications had different ways of handling character sets I
had to convert the forum database from latin1 to utf8 for the forum
application to correctly pick up the characters. Of course, I was quite
frustrated with the inability of the forum packages to deal with the
character sets correctly but I also realised that the problem may have
been non-existent if the database originally had been utf8.

The suggested solution is to always compile the mysql package with utf8 as
default character set. (./configure --with-charset=utf8)

I am using Ubuntu 7.04, 2.6.20-16-generic kernel and libc6 2.5-0ubuntu14.

The reason I am reporting this with Debian bugs and not with Ubuntu bugs /
Launchpad is because I noticed it was - as I believe - as incorrect in the
Debian package as it is in the Ubuntu package. I also have a suspicion
that the Ubuntu package is supplied directly from the Debian package
without modification - I may, of course, be wrong about that.

Thanks and Best Regards,
Johan Ramm-Ericson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]