Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

Michael Radziej Wed, 31 Jan 2007 07:39:48 -0800

Hi,

A few days ago, I wrote:
> I see three ways to fix the problem in #3370:
> 
> a) newforms stops passing unicode strings to the Database API and uses
> bytestrings.
> 
> b) the database wrapper in Django sets connection.charset (but needs to
> translate the charset name since the databases don't understand all
> charset name variants, see ticket #952 here). This is the approach of
> the patches in tickets #1356 and #3370.
> 
> c) the database wrapper in Djago must check whether it gets unicode. In
> this case, it needs to encode it into a bytestring.


I now see a fourth way that would resolve #952 at the same time:

d) make the database wrapper accept both unicode and bytestrings in
the models, but always pass unicode strings to the database backend.

Details:

For #952 to work, the name of the character encoding has to be
translated from python naming conventions to these of the used
backend, and this would need a huge table (see the ticket). It looks
easy, but it's a major annoyance.

Now, instead of doing this, how about modifying the database wrapper
so that it actually tests whether it gets unicode or bytestrings,
and in the case of bytestrings, decodes it to unicode using
settings.CHARACTER_SET as encoding? Then it could use unicode to
talk to its backend. As far as I see, psycopg2 is unicode capable,
and python-MySQLdb, too.

This is different from the proposal in the thread 'Unicode or
Strings in Models', as I'd still accept both forms in the model and
deal with it only when I send it to the database. 'Only unicode in
models' would be a major change with many scattered pieces. My
proposal is for a transition phase, to support piece-wise conversion
to Unicode without breaking everything on the way (as newforms does).

Disadvantage: The backend will probably decode it again to get it
across the wire, to either UTF-8 or settings.DEFAULT_CHARSET (or
something else), adding overhead to the database communication.

I think this is a necessary transition from bytestrings to the Great
Unicodification of Everything. As soon as there's unicode
everywhere, the code that deals with bytestrings can be removed and
the solution will fit in perfectly.


What do you think?

Michael


-- 
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100

http://www.noris.de - The IT-Outsourcing Company

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

Reply via email to