Re: unicode in filters

Maciej Bliziński Mon, 19 Feb 2007 08:48:45 -0800

On Mon, 2007-02-19 at 13:01 +0000, omat * gezgin.com wrote:
> I am trying to match a utf-8 character with a filter. Within the
> python prompt, "u'ç'.encode('utf-8')" returns "\xc3\xa7" correctly but
> when I use this inside a filter like:
> 
> (name__startswith = u'ç'.encode('utf-8'))
> 
> I get a syntax error:
> Non-ASCII character '\xc3' in file .../views.py on line 24, but no
> encoding declared...


When you type something like 'ç' or u'ç', Python reads your source and
it's important whether it know what encoding it is in. The solution you
law in previous posts, were to declare the coding of the Python source
file, which is the right thing to do.

Just to let you know what happens here: There's something called Unicode
object. Python can create an Unicod object by _decoding_ a string. In
order to do that, assuming that your source is encoded in UTF-8, you do
like this:

unicode_obj = 'ç'.decode('utf-8')

This function returns a Unicode object, which can be later _encoded_ to
a string:

utf_8_encoded_string = unicode_obj.encode('utf-8')

Back to your example, you typed: u'ç'.encode('utf-8'), which told
Python: Take this Unicode object and encode it in UTF-8. But hey, 'ç' is
not a unicode object, it's a string! How is this string encoded? Dunno,
the source doesn't say. In such a case we assume ASCII. But hey again,
this is not an ASCII character! I'm going to complain!

I hope it helps in the future, so you know what decoding and encoding
means.

Cheers,
Maciej

-- 
Maciej Bliziński
http://automatthias.wordpress.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: unicode in filters

Reply via email to