Eugene Toder added the comment:
To recap, the issue is that pickle doesn't handle recursion via reduce
arguments (i.e. arguments to the constructor function as returned in 2nd
element of the tuple from __reduce__). This leads to 2 kind of effects:
class C:
def __init__(self, x=None):
self.x = x if x is not None else self
def __reduce__(self):
return C, (self.x,)
A. Recursion error:
>>> pickle.dumps(C())
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
pickle.dumps(C())
RuntimeError: maximum recursion depth exceeded while calling a Python object
This cannot be helped with the current reduce protocol. The error may be
improved, but that's about it.
B. Duplication of object when unpickling:
>>> c = C([])
>>> c.x.append(c)
>>> c.x[0] is c
True
>>> c2 = pickle.loads(pickle.dumps(c))
>>> c2.x[0] is c2
False
This happens because list (or another recursion-friendly type) inside the
problematic object handles recursion, but we still get the outer object,
identical to the inner one.
This can be solved the same way as for tuple:
>>> t=([],1,2)
>>> t[0].append(t)
>>> t2 = pickle.loads(pickle.dumps(t))
>>> t2[0][0] is t2
True
>>> pickletools.dis(pickle.dumps(t))
0: \x80 PROTO 3
2: ] EMPTY_LIST
3: q BINPUT 0
5: h BINGET 0
7: K BININT1 1
9: K BININT1 2
11: \x87 TUPLE3
12: q BINPUT 1
14: a APPEND
15: K BININT1 1
17: K BININT1 2
19: 0 POP
20: 0 POP
21: 0 POP
22: h BINGET 1
24: . STOP
After pickling its elements tuple checks if it got into memo. If it did, this
means it was pickled by one of the elements, so it POPs all elements from the
stack and fetches itself via GET. This is somewhat inefficient, but probably
the best it can do.
I suggest we do 3 things:
1. Improve the documentation for __reduce__ function. It should mention that
all state that a) can potentially point back to the object and b) not strictly
necessary in the constructor function should be passed via the 3rd element of
__reduce__ tuple (aka state) instead of the 2nd element, and applied by
__setstate__. This handles recursion in robust and optimal way.
2. Check that all built-in/standard types follow this advice. I see that Stefan
Mihaila already fixed sets.
3. To fix case B above add the memo check from save_tuple to save_reduce. While
at it, we can consider checking after pickling every element instead of after
pickling all elements, so we reduce the number of POPs and the wastage they
cause.
----------
nosy: +eltoder, pitrou
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1062277>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com