Re: Unicode + memcache = bug

Malcolm Tredinnick Thu, 12 Jul 2007 06:34:14 -0700

On Thu, 2007-07-12 at 05:34 -0500, Jeremy Dunck wrote:
> When using the low-level cache and memcache as the backend, you're
> likely to run into this stack trace:
> 
> ...
> File "/pegasus/code/current/django/core/cache/backends/memcached.py" in set
>   48. self._cache.set(key, value, timeout or self.default_timeout)
> File "/usr/lib/python2.5/site-packages/memcache.py" in set
>   305. return self._set("set", key, val, time)
> File "/usr/lib/python2.5/site-packages/memcache.py" in _set
>   328. fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), 
> val)
> 
>   UnicodeDecodeError at /
>   'ascii' codec can't decode byte 0x80 in position 0: ordinal not in 
> range(128)
> 
> What's going on here is that the memcache.py library does this with
> the passed parameters:
> 
> fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), val)
> 
> Since "key" is often a unicode string, it infects, as it were, the
> rest of the line, forcing "val" to be encoded, then decoded.


I thought I understood the problem until I read this sentence. Now my
brain hurts. I fully understand that the whole string is treated as
Unicode as soon as one argument is Unicode. Why is "val" the problem
here then? What sort of object is "val" and why doesn't unicode(val)
work (aah ... is is going via str(val) and val is non-ASCII? That could
do it).

The error in the traceback suggests it is trying to treat something
*not* as Unicode. I'm a little fuzzy on what's going on.

> It may be that only the memcache backend has this problem, but the
> general solution I'd suggest is to use smart_str on the key given to
> each low-level cache's backend set method.  Works-for-me.

Hasn't actually occurred to me to check previously: can memcache handle
non-ASCII data there, because even converting to UTF-8 is going to give
values that are not always understandable to the ascii codec.

> It may also make sense to run on the value, but I imagine that has a
> significant overhead, and I haven't had a problem with it yet....

Assuming the missing key part of this sentence is force_unicode(), it
should be not really worse than running smart_str() (about one extra
function call), from first glance. However, as indicated above, I'll
admit to being sketchy about the real problem still.

If you can guarantee that str(val) will always make sense and be encoded
as UTF-8, then your proposed solution sounds fine. The encoding of
str(val) is important, because we have to able to understand it when we
pull it out from the cache again later.

Regards,
Malcolm

-- 
Works better when plugged in. 
http://www.pointy-stick.com/blog/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Unicode + memcache = bug

Reply via email to