Hi there,

I thank you for all your patience with me. I was completely off-track. I
read all the mails again, and everything is starting to make sense now.
This is going to be a lengthy email about #1356 and #3370, but please do
read until the end. Short executive summary: It's really a bug, and the
patch is not bad, but incomplete.

First, contrary to my former opinion, #3370 is a bug in the newforms
module, as it is passing unicode to the database API which is not ripe
for it and will break as soon as you leave ASCII. #3370 is independent
of #952.


I see three ways to fix the problem in #3370:

a) newforms stops passing unicode strings to the Database API and uses
bytestrings.

b) the database wrapper in Django sets connection.charset (but needs to
translate the charset name since the databases don't understand all
charset name variants, see ticket #952 here). This is the approach of
the patches in tickets #1356 and #3370.

c) the database wrapper in Djago must check whether it gets unicode. In
this case, it needs to encode it into a bytestring.


With all three variants, what encoding should be used? We currently
issue (without #952) a 'set name utf8' at the beginning of each
connection, so the database server expects to receive utf8. So,
shouldn't we currently always use utf8 encoding, regardless of what is
in settings.DEFAULT_CHARSET? This point has caused a lot of confusion.

Ivan wrote:

> I'm -1 on setting MySQL connection to 'utf8' in #3370. It *will* make
> sense when we will have newforms ready and models containing unicode.
> But now most of Django is a byte string country. A bright example are
> generic views that take data from web and store it to models without any
> conversions. This patch will feed 'windows-1251' or 'iso-8859-1' to
> MySQL saying that "it's utf-8" and MySQL will try to convert it and most
> certainly will store just strings of '????'.

Well, the current patch in #3370 (I still ignore __repr__) only changes
the charset attribute of a connection, and this attribute is used only
to encode unicode strings when sending data to the database, or to
decode bytestrings received from the database when MySQLdb is configured
to produce unicode ('use_unicode'). Here's what the documentation in
MySQLdb-1.2.2b2 says:

         use_unicode
            If True, CHAR and VARCHAR and TEXT columns are returned as
            Unicode strings, using the configured character set. It is
            best to set the default encoding in the server
            configuration, or client configuration (read with
==>         read_default_file).  If you change the character set after
==>         connecting (MySQL-4.1 and later), you'll need to put the
==>         correct character set name in connection.charset.

            If False, text-like columns are returned as normal strings,
            but you can always write Unicode strings.

            *This must be a keyword parameter.*

(But, the charset parameter is also used when you pass in unicode
without setting use_unicode)

python-MySQLdb-1.2.1p2 is similar, only that there it is no keyword
parameter. There's an interesting difference between 1.2.1p2 and
1.2.2b2: For 1.2.1p2, you have to change the charset attribute of the
existing connection. If you try this on 1.2.2b2, it won't work. For
1.2.2b2, you either have to pass a 'charset' parameter when you create
the connection, or you can call a method set_character_set(). Both of
these won't work for 1.2.1p2, of course :-(

So, the APIs of python-MySQLdb are incompatible with each other (within
a minor version change!) This explains the differences between #1356 and
#3370. We need a patch that plays well with both versions of python-MySQLdb.

I don't see a problem with the generic views since they pass bytestrings
to the database wrapper, this gets as bytestrings to MySQLdb, and for
bytestrings the charset attribute is not used at all.

Of course, as soon as #952 has been applied, we need to use the encoding
from settings.DEFAULT_ENCODING.


Michael


P.S.:

If you set the charset parameter in 1.2.2b2's Connection.__init__(), the
default for use_unicode will be True, and python-MySQLdb will return
unicode strings.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to