On 7/12/07, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote:
>
> On Thu, 2007-07-12 at 05:34 -0500, Jeremy Dunck wrote:
...
> > What's going on here is that the memcache.py library does this with
> > the passed parameters:
> >
> > fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), val)
> >
> > Since "key" is often a unicode string, it infects, as it were, the
> > rest of the line, forcing "val" to be encoded, then decoded.
>
> I thought I understood the problem until I read this sentence. Now my
> brain hurts. I fully understand that the whole string is treated as
> Unicode as soon as one argument is Unicode. Why is "val" the problem
> here then? What sort of object is "val" and why doesn't unicode(val)
> work (aah ... is is going via str(val) and val is non-ASCII? That could
> do it).

Sorry for not giving more context.

In that quoted line, cmd is a str (created by the library itself), key
is whatever the low-level django API passes in (very likely a
Unicode), and val is a pickled object (that is, arbitrary binary).

When key is Unicode, it forces val to be decoded into Unicode, which
fails, since it's a binary.

At least, I'm pretty darn sure.  I *think* I understand this bit-pushing.  :)

> Hasn't actually occurred to me to check previously: can memcache handle
> non-ASCII data there, because even converting to UTF-8 is going to give
> values that are not always understandable to the ascii codec.
>

/me checks python-memcache code.

python-memcache assumes a str key with no control characters (ord(c)
>= 33) and len(key) < 250.

The stored value can be any object, but there are a few optimizations.
 This is how the marshalling is done:

if isinstance(val, types.StringTypes):
   pass
elif isinstance(val, int):
   flags |= Client._FLAG_INTEGER
   val = "%d" % val
elif isinstance(val, long):
   flags |= Client._FLAG_LONG
   val = "%d" % val
else:
   flags |= Client._FLAG_PICKLE
   val = pickle.dumps(val, 2)
fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), val)

The result, fullcmd, is then sent over the wire.

So, my assertion is that key is the only possible unicode value, and
that it better be coercable to str using sys.getdefaultencoding(),
because otherwise the string format will die.

cmd, flags, time, len(val), and val must all be str or unicode (it's
odd that they have StringTypes there, when they clearly don't handle a
Unicode value in the general sense).

My understanding is that smart_str forces a unicode value to str using
encoding='utf-8', and is a no-op when passed a str.

I want to make sure that all parameters there are str; I'm pretty
confident "key" is the only non-str object.

> The encoding of
> str(val) is important, because we have to able to understand it when we
> pull it out from the cache again later.

I agree, but I don't want to mess with val; I want to force encoding of "key".

Clearer?

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to