[issue17810] Implement PEP 3154 (pickle protocol 4)
Stefan Mihaila added the comment: On 6/3/2013 9:33 PM, Alexandre Vassalotti wrote: > Alexandre Vassalotti added the comment: > > Stefan, could you address my review comments soon? The improved support for > globals is the only big piece missing from the implementation of PEP, which I > would like to get done and submitted by the end of the month. > > -- > > ___ > Python tracker > <http://bugs.python.org/issue17810> > ___ > Yes, I apologize for the delay again. Today is my last exam this semester, so I'll do my best to get it done as soon as possible (hopefully this weekend). -- ___ Python tracker <http://bugs.python.org/issue17810> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Changes by Stefan Mihaila : Added file: http://bugs.python.org/file30216/d0c3a8d4947a.diff ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Stefan Mihaila added the comment: On 5/10/2013 11:46 PM, Stefan Mihaila wrote: > Changes by Stefan Mihaila : > > > -- > nosy: +mstefanro > > ___ > Python tracker > <http://bugs.python.org/issue17810> > ___ > Hello. I've worked on implementing PEP3154 as part of GSoC2012. My work is available in a repo at [1]. The blog I've used to report my work is at [2] and contains some useful information. Here is a list of features that were implemented as part of GSoC: * Pickling of very large bytes and strings * Better pickling of small string and bytes (+ tests) * Native pickling of sets and frozensets (+ tests) * Self-referential sets and frozensets (+ tests) * Implicit memoization (BINPUT is implicit for certain opcodes) - The argument against this was that pickletools.optimize would not be able to prevent memoization of objects that are not referred later. For such situations, a special flag at beginning could be added, which indicates whether implicit BINPUT is enabled. This flag could be added as one of the higher-order bits of the protocol version. For instance: PROTO \x04 + BINUNICODE ".." and PROTO \x84 + BINUNICODE ".." + BINPUT 1 would be equivalent. Then pickletools.optimize could choose whether it wants implicit BINPUT or not. Sure, this would complicate matters and it's not for me to decide whether it's worth it. In my midterm report at [3] there are some examples of what a pickled string looks in v4 without implicit memoization, and some size comparisons to v3. * Pickling of nested globals, methods etc. (+ tests) * Pickling calls to __new__ with keyword args (+ tests) * A BAIL_OUT opcode was always outputted when pickling failed, so that the Pickler and Unpickler can be both run at once on different ends of a stream. The Pickler could guarantee to always send a correct pickle on the stream. The Unpickler would never end up hanging when Pickling failed mid-work. - At the time, Alexandre suggested this would probably not be a great idea because it should be the responsibility of the protocol used to assure some consistency. However, this does not appear to be a trivial task to achieve. The size of the pickle is not known in advance, and waiting for the Pickler to complete before sending the data via stream is not as efficient, because the Unpickler would not be able to run at the same time. write and read methods of the stream would have to be wrapped and some escape sequence used. This would increase the size of the pickled string for some sort of worst-case of the escape sequence, probably. My thought was that it would be beneficial for the average user to have the guarantee that the Pickler always outputs a correct pickle to a stream, even if it raises an exception. * Other minor changes that I can't really remember. Although I'm sure Alexandre had his good reasons to start the work from scratch, it would be a shame to waste all this work. The features mentioned above are working and although the implementation may not be ideal (I don't have the cpython experience of a regular dev), I'm sure useful bits can be extracted from it. Alexandre suggested that I extract bits and post patches, so I have attached, for now, support for pickling methods and nested globals (+tests). I'm willing to do so for some or the rest of the features, should this be requested and should I have the necessary time to do so. [1] https://bitbucket.org/mstefanro/pickle4/ [2] https://pypickle4.wordpress.com/ [3] https://gist.github.com/mstefanro/3086647 -- Added file: http://bugs.python.org/file30213/methods.patch ___ Python tracker <http://bugs.python.org/issue17810> ___diff -r 780722877a3e Lib/pickle.py --- a/Lib/pickle.py Wed May 01 13:16:11 2013 -0700 +++ b/Lib/pickle.py Sat May 11 03:06:28 2013 +0300 @@ -23,7 +23,7 @@ """ -from types import FunctionType, BuiltinFunctionType +from types import FunctionType, BuiltinFunctionType, MethodType, ModuleType from copyreg import dispatch_table from copyreg import _extension_registry, _inverted_registry, _extension_cache from itertools import islice @@ -34,10 +34,44 @@ import io import codecs import _compat_pickle +import builtins +from inspect import ismodule, isclass __all__ = ["PickleError", "PicklingError", "UnpicklingError", "Pickler", "Unpickler", "dump", "dumps", "load", "loads"] +# Issue 15397: Unbinding of methods +# Adds the possibility to unbind methods as well as a few definitio
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Stefan Mihaila : Removed file: http://bugs.python.org/file30211/780722877a3e.diff ___ Python tracker <http://bugs.python.org/issue17810> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Stefan Mihaila : Added file: http://bugs.python.org/file30211/780722877a3e.diff ___ Python tracker <http://bugs.python.org/issue17810> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17810] Implement PEP 3154 (pickle protocol 4)
Changes by Stefan Mihaila : -- nosy: +mstefanro ___ Python tracker <http://bugs.python.org/issue17810> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Stefan Mihaila added the comment: Hello. I apologize once again for not finalizing my work, but once I have started my final year of faculty and a job, I have been busy pretty much all the time. I would really like to finish this as I've really enjoyed working on it, and everything on PEP 3154 and some other stuff has already been implemented. The only remaining part was finalizing the code review and fixing some memory leaks that gave me some headaches at the time. I would really appreciate if you could give me a few more days before deciding to start a new implementation from scratch. I'll get to fixing those memory leaks in the next couple of days and then the code review can be finalized. Would this be acceptable to you? -- ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15773] `is' operator returns False on classmethods
Changes by Stefan Mihaila : -- type: -> behavior ___ Python tracker <http://bugs.python.org/issue15773> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15773] `is' operator returns False on classmethods
New submission from Stefan Mihaila: Here are a few counter-intuitive outputs: >>> dict.fromkeys is dict.fromkeys False >>> id(dict.fromkeys) == id(dict.fromkeys) True >>> x=dict.fromkeys; id(x) == id(x) True >>> x=dict.fromkeys; id(x) == id(dict.fromkeys) False >>> x=dict.fromkeys; y=dict.fromkeys; id(x),id(y),id(dict.fromkeys) (3924, 39064632, 39065144) >>> a=id(dict.fromkeys); x=dict.fromkeys; b=id(dict.fromkeys); a,b (3924, 39480568) Attached is a failing test. -- files: is_on_classmethods.py messages: 168967 nosy: mstefanro priority: normal severity: normal status: open title: `is' operator returns False on classmethods versions: Python 3.3 Added file: http://bugs.python.org/file26978/is_on_classmethods.py ___ Python tracker <http://bugs.python.org/issue15773> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Stefan Mihaila added the comment: Are there also some known techniques on tracking down memory leaks? I've played around with sys.gettotalrefcount to narrow down the place where the leaks occur, but they seem to only occur in v4, i.e. pickle.dumps(3.0+1j, 4) leaks but pickle.dumps(3.0+1j, 3) does not. However, there appears to be no difference in the code that gets executed in v3 to the one executed in v4. -- ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Stefan Mihaila added the comment: >- I don't really like the idea of changing the semantics of the PUT and GET >opcodes. I would prefer new opcodes if possible. Well, the semantics of PUT and GET haven't really changed. It's just that the PUT opcode is not generated anymore and memoization is done "in agreement" (i.e. both the pickler and the unpickler know when to memoize so their memo tables stay in sync). So, in fact, it's the semantics of the other opcodes that has slightly changed. >- I would like to see benchmarks for this change. I've tried the following two snippets with timeit: ./python3.3 -m timeit \ -s 'from pickle import dumps' \ -s 'd=["a"]*100' 'dumps(d,3)' # replace 3 with 4 for comparison ./python3.3 -m timeit \ -s 'from pickle import dumps' \ -s 'd=list(map(chr,range(0,256)))' \ 'dumps(d,3)' # replace 3 with 4 for comparison # you can also use loads(dumps(d,3)) here to benchmark both # operations at once The first one generates 99 BINGET opcodes. It generates 1 BINPUT opcode in pickle3 and no BINPUT opcodes in pickle4. There appears no noticeable speed difference. The second one generates no BINGET opcodes. It generates no BINPUT opcodes in v4, respectively 256 BINPUT opcodes in v3. It appears the v4 one is slightly faster, but I have a hard time comparing correctly, given that the measurements seem to have a very large standard deviation (v4 gets times somewhere between 32.3 and 44.2, whereas v3 gets times between 37.7 and 52.2). I'm not sure this is the best way to benchmark, so let me know what is usually used. -- ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Stefan Mihaila added the comment: There are still some upcoming changes. -- ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Stefan Mihaila added the comment: Maybe you can set this issue as the superseder of issue9269, because the patches there have already been applied here. -- ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15642] Integrate pickle protocol version 4 GSoC work by Stefan Mihaila
Stefan Mihaila added the comment: Maybe we could postpone the review process for a few days until I fix some known issues -- ___ Python tracker <http://bugs.python.org/issue15642> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue633930] Nested class __name__
Stefan Mihaila added the comment: Only an issue in Python2. >>> A.B.__qualname__ 'A.B' >>> repr(A.B) "" -- nosy: +mstefanro versions: +Python 2.6, Python 2.7 ___ Python tracker <http://bugs.python.org/issue633930> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9269] Cannot pickle self-referencing sets
Stefan Mihaila added the comment: Attaching patch for fixing a test and adding better testing of sets. -- Added file: http://bugs.python.org/file26539/sets-test.patch ___ Python tracker <http://bugs.python.org/issue9269> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1062277] Pickle breakage with reduction of recursive structures
Changes by Stefan Mihaila : -- nosy: +mstefanro ___ Python tracker <http://bugs.python.org/issue1062277> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9269] Cannot pickle self-referencing sets
Stefan Mihaila added the comment: I have attached a fix to this issue (and implicitly issue1062277). This patch allows pickling self-referential sets by implementing a set.__reduce__ which uses states as opposed to ctor parameters. Before: >>> s=set([1,2,3]) >>> s.__reduce__() (, ([1, 2, 3],), None) >>> len(pickle.dumps(s,1)) 38 After: >>> s=set([1,2,3]) >>> s.__reduce__() (, (), [1, 2, 3]) >>> len(pickle.dumps(s,1)) 36 Basically what this does is: instead of unpickling the set by doing set([1,2,3]) it does s=set(); s.__setstate__([1,2,3]). States are supported in all versions of pickle so this shouldn't break anything. Creating empty data structures and then filling them is the way pickle does it for all mutable containers in order to allow self-references (with the exception of sets, of course). Since memoization is performed after the object is created but before its state is set, pickling an object's state can contain references to oneself. class A: pass a=A() s=set([a]) a.s=s s_=loads(dumps(s,1)) next(iter(s_)).s is s_ # True Note that this fix only applies for sets, not frozensets. Frozensets are a different matter, because their immutability makes it impossible to set their state. Self-referential frozensets are currently supported in my implementation of pickle4 using a trick similar to what tuples use. But the trick works more easily there because frozensets have their own opcodes, like tuples. Also note that applying this patch makes Lib/test/pickletester.py:test_pickle_to_2x fail (DATA3 and DATA6 there contain pickled data of sets, which naturally have changed). I'll upload a patch fixing this as well as adding one or more test for sets soon. -- keywords: +patch nosy: +mstefanro Added file: http://bugs.python.org/file26533/self_referential-sets.patch ___ Python tracker <http://bugs.python.org/issue9269> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15397] Unbinding of methods
Stefan Mihaila added the comment: Andrew, thanks for creating a separate issue (the refleak was very rare and I thought I'd put it in the same place, but now I realize it was a bad idea). Richard, actually, the isinstance(self, type) check I mentioned earlier would have to be before the hastattr(f, '__func__') check, because Python classmethods provide a __func__ too: def unbind(f): self = getattr(f, '__self__', None) if self is not None and not isinstance(self, types.ModuleType) \ and not isinstance(self, type): if hasattr(f, '__func__'): return f.__func__ return getattr(type(f.__self__), f.__name__) raise TypeError('not a bound method') Anyway, I'm not convinced this is worth adding anymore. As Antoine Pitrou suggested on the ml, it would probably be a better idea if I implemented __reduce__ for builtin methods as well as Python methods rather than having a separate opcode for pickling methods. -- ___ Python tracker <http://bugs.python.org/issue15397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15397] Unbinding of methods
Stefan Mihaila added the comment: Richard, yes, I think that would work, I didn't think of using f.__self__'s type. You might want to replace if self is not None and not isinstance(self, types.ModuleType): with if self is not None and not isinstance(self, types.ModuleType) \ and not isinstance(self, type): to correctly raise an exception when called on a classmethod too. -- ___ Python tracker <http://bugs.python.org/issue15397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15397] Unbinding of methods
Stefan Mihaila added the comment: Doesn't the definition I've added at the end of methodobject.c suffice? (http://codereview.appspot.com/6425052/patch/1/10) Or should the macro be removed altogether? -- ___ Python tracker <http://bugs.python.org/issue15397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15397] Unbinding of methods
Stefan Mihaila added the comment: Yes, the patch is at http://codereview.appspot.com/6425052/ The code there also contains some tests I've written for functools.unbind. -- Added file: http://bugs.python.org/file26439/unbind_test.patch ___ Python tracker <http://bugs.python.org/issue15397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15397] Unbinding of methods
New submission from Stefan Mihaila : In order to implement pickling of instance methods, a means of separating the object and the unbound method is necessary. This is easily done for Python methods (f.__self__ and f.__func__), but not all of builtins support __func__. Moreover, there currently appears to be no good way to distinguish functions from bound methods. As a first step in solving this issue, I have attached a patch which: 1) adds __func__ for all function types 2) adds a few new definitions in the types module (AllFunctionTypes etc.) 3) adds isanyfunction(), isanyboundfunction(), isanyunboundfunction() in inspect (admittedly these are bad names) 4) functools.unbind In case applying this patch is being considered, serious review is necessary, as I'm not knowledgeable of cpython internals. -- components: Library (Lib) files: func.patch keywords: patch messages: 165845 nosy: mstefanro priority: normal severity: normal status: open title: Unbinding of methods type: enhancement versions: Python 3.3 Added file: http://bugs.python.org/file26438/func.patch ___ Python tracker <http://bugs.python.org/issue15397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15397] Unbinding of methods
Changes by Stefan Mihaila : -- nosy: +alexandre.vassalotti, ncoghlan, rhettinger ___ Python tracker <http://bugs.python.org/issue15397> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com