[Bruce Christensen]
> We seem to have stumbled upon some strange behavior in cPickle's memo
> use when pickling instances.
>
> Here's the repro:
>
> [mymodule.py]
> class C:
> def __getstate__(self): return ('s1', 's2', 's3')
>
> [interactive interpreter]
> Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cPickle
> >>> import mymodule
> >>> class C:
> ... def __getstate__(self): return ('s1', 's2', 's3')
> ...
> >>> for x in mymodule.C(), C(): cPickle.dumps(x)
> ...
> "(imymodule\nC\np1\n(S's1'\nS's2'\np2\nS's3'\np3\ntp4\nb."
> "(i__main__\nC\np1\n(S's1'\nS's2'\nS's3'\ntp2\nb."
> >>>
>
> Note that the second and third strings in the instance's state are
> memoized in the first case, but not in the second. Any idea why this
> occurs (and why the first element is never memoized)?
Ideally, a pickle would never contain a `PUT i` unless i was
referenced by a `GET i` later. So, ideally, there would be no PUT
opcodes in either of these pickles.
cPickle is a little bit smarter than pickle.py here, in that cPickle
suppresses a PUT if the reference count on the object is less than 2
(in which case the structure being pickled can't possibly reference
the sub-object a second time, so it's impossible that a later GET will
want to reference the same sub-object). So all you're seeing here is
refcount accidents, complicated by accidents concerning exactly which
strings get interned.
Use pickle.py instead (which doesn't do this refcount
micro-optimization), and you'll see the same number of PUTs in both.
They're all correct. What would be incorrect is seeing a `GET i`
without a preceding `PUT i` using the same `i`.
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com