CharField with utf-8 handling

hironobu Tue, 23 May 2006 21:47:40 -0700

Hello, Django users and developers.

I'd been trying to handle non-ascii string (such as Japanese text) from
MySQL database for recent several days, on version 0.95
"post-magic-removal".  Django loads onto memory as raw byte strings and
saves similarly too, so string data I can see directly on database:
    u'\u3042\u3044\u3046\u3048\u304a'
    (This is "hiragana" sequence, just like ABC... on English)
appers after loading on django, like this:
    '\xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\xe3\x81\x88\xe3\x81\x8a'


This is because of ignorance of "utf-8" sequence, I want to treat this
as unicode string using "unicode()" or "decode()":
    >>>
'\xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\xe3\x81\x88\xe3\x81\x8a'.decode('utf-8')
    u'\u3042\u3044\u3046\u3048\u304a'
but, django directly loads onto "models" object's attributes, and
treats as "CharField" string ... it doesn't take care of string
encoding.

If possible, I want to propose such as "UTF8StringField" to use utf-8
string. It converts a raw byte sequence of string with decoding as
"utf-8", holds as "unicode" string internally, and saves as "utf-8"
byte sequence to database. I made a so-easy patch to fulfill this.
Maybe I don't completely read and understand through whole parts of
Django..., then excuse me. ;-) But if this feature is not implemented
yet, please use this patch.

thanks.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users
-~----------~----~----~----~------~----~------~--~---

CharField with utf-8 handling

Reply via email to