Hello, Django users and developers.
I'd been trying to handle non-ascii string (such as Japanese text) from
MySQL database for recent several days, on version 0.95
"post-magic-removal". Django loads onto memory as raw byte strings and
saves similarly too, so string data I can see directly on database:
u'\u3042\u3044\u3046\u3048\u304a'
(This is "hiragana" sequence, just like ABC... on English)
appers after loading on django, like this:
'\xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\xe3\x81\x88\xe3\x81\x8a'
This is because of ignorance of "utf-8" sequence, I want to treat this
as unicode string using "unicode()" or "decode()":
>>>
'\xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\xe3\x81\x88\xe3\x81\x8a'.decode('utf-8')
u'\u3042\u3044\u3046\u3048\u304a'
but, django directly loads onto "models" object's attributes, and
treats as "CharField" string ... it doesn't take care of string
encoding.
If possible, I want to propose such as "UTF8StringField" to use utf-8
string. It converts a raw byte sequence of string with decoding as
"utf-8", holds as "unicode" string internally, and saves as "utf-8"
byte sequence to database. I made a so-easy patch to fulfill this.
Maybe I don't completely read and understand through whole parts of
Django..., then excuse me. ;-) But if this feature is not implemented
yet, please use this patch.
thanks.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-users
-~----------~----~----~----~------~----~------~--~---