Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
+1 for not adding in-pickle compression as it is already very easy to handle compression externally (for instance by passing a compressing file object as an argument to the pickler). Furthermore, as PEP 574 makes it possible to stream the buffer bytes directly to the file-object without any temporary memory copy I don't see any benefit in including the compression into the pickle protocol. However adding lz4.LZ4File to the standard library in addition to gzip.GzipFile and lzma.LZMAFile is probably a good idea as LZ4 is really fast compared to zlib/gzip. But this is not related to PEP 574. -- Olivier ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
Hi all, I agree that compression is often a good idea when moving serialized objects around on a network, but for what it's worth I as a library author would always set compress=False and then handle it myself as a separate step. There are a few reasons for this: 1. Bandwidth is often pretty good, especially intra-node, on high performance networks, or on decent modern discs (NVMe) 2. I often use different compression technologies in different situations. LZ4 is a great all-around default, but often snappy, blosc, or z-standrad are better suited. This depends strongly on the characteristics of the data. 3. Very often data often isn't compressible, or is already in some compressed form, such as in images, and so compressing only hurts you. In general, my thought is that compression is a complex topic with enough intricaces that setting a single sane default that works 70+% of the time probably isn't possible (at least not with the applications that I get exposed to). Instead of baking a particular method into pickle.dumps I would recommend trying to solve this problem through documentation, pointing users to the various compression libraries within the broader Python ecosystem, and perhaps pointing to one of the many blogposts that discuss their strengths and weaknesses. Best, -matt ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
Antoine Pitrou schrieb am 25.05.2018 um 23:11: > On Fri, 25 May 2018 14:50:57 -0600 > Neil Schemenauer wrote: >> On 2018-05-25, Antoine Pitrou wrote: >>> Do you have something specific in mind? >> >> I think compressed by default is a good idea. My quick proposal: >> >> - Use fast compression like lz4 or zlib with Z_BEST_SPEED >> >> - Add a 'compress' keyword argument with a default of None. For >> protocol 5, None means to compress. Providing 'compress' != None >> for older protocols will raise an error. > > The question is what purpose does it serve for pickle to do it rather > than for the user to compress the pickle themselves. You're basically > saving one line of code. Am I missing some other advantage? Regarding the pickling side, if the pickle is large, then it can save memory to compress while pickling, rather than compressing after pickling. But that can also be done with file-like objects, so the advantage is small here. I think a major advantage is on the unpickling side rather than the pickling side. Sure, users can compress a pickle after the fact, but if there's a (set of) standard algorithms that unpickle can handle automatically, then it's enough to pass "something pickled" into unpickle, rather than having to know (or figure out) if and how that pickle was originally compressed, and build up the decompression pipeline for it to get everything uncompressed efficiently without accidentally wasting memory or processing time. Obviously, auto-decompression opens up a gate for compression bombs, but then, unpickling data from untrusted sources is discouraged anyway, so... Stefan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
On Fri, May 25, 2018 at 3:35 PM, Neil Schemenauer wrote: > This discussion can easily lead into bikeshedding (e.g. relative > merits of different compression schemes). Since I'm not > volunteering to implement anything, I will stop responding at this > point. ;-) I think the bikeshedding -- or more to the point, the fact that there's a wide variety of options for compressing pickles, and none of them are appropriate in all circumstances -- means that this is something that should remain a separate layer. Even super-fast algorithms like lz4 are inefficient when you're transmitting pickles between two processes on the same system – they still add extra memory copies. And that's a very common use case. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
On 2018-05-25, Antoine Pitrou wrote: > The question is what purpose does it serve for pickle to do it rather > than for the user to compress the pickle themselves. You're basically > saving one line of code. It's one line of code everywhere pickling or unpicking happens. And you probably need to import a compression module, so at least two lines. Then maybe you need to figure out if the pickle is compressed and what kind of compression is used. So, add a few more lines. It seems logical to me that users of pickle want it to be fast and produce small pickles. Compressing by default seems the right choice, even though it complicates the implementation. Ivan brings up a valid point that compressed pickles are harder to debug. However, I think that's much less important than being small. > it requires us to ship the lz4 library with Python Yeah, that's not so great. I think zlib with Z_BEST_SPEED would be fine. However, some people might worry it is too slow or doesn't compress enough. Having lz4 as a battery included seems like a good idea anyhow. I understand that it is pretty well established as a useful compression method. Obviously requiring a new C library to be included expands the effort of implementation a lot. This discussion can easily lead into bikeshedding (e.g. relative merits of different compression schemes). Since I'm not volunteering to implement anything, I will stop responding at this point. ;-) Regards, Neil ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
On Fri, 25 May 2018 14:50:57 -0600 Neil Schemenauer wrote: > On 2018-05-25, Antoine Pitrou wrote: > > Do you have something specific in mind? > > I think compressed by default is a good idea. My quick proposal: > > - Use fast compression like lz4 or zlib with Z_BEST_SPEED > > - Add a 'compress' keyword argument with a default of None. For > protocol 5, None means to compress. Providing 'compress' != None > for older protocols will raise an error. The question is what purpose does it serve for pickle to do it rather than for the user to compress the pickle themselves. You're basically saving one line of code. Am I missing some other advantage? (also note that it requires us to ship the lz4 library with Python, or another modern compression library such as zstd; zlib's performance characteristics are outdated) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
On 2018-05-25, Antoine Pitrou wrote: > Do you have something specific in mind? I think compressed by default is a good idea. My quick proposal: - Use fast compression like lz4 or zlib with Z_BEST_SPEED - Add a 'compress' keyword argument with a default of None. For protocol 5, None means to compress. Providing 'compress' != None for older protocols will raise an error. The compression overhead will be small compared to the pickle/unpickle costs. If someone wants to apply their own (e.g. better) compression, they can set compress=False. An alternative idea is to have two different protocol formats. E.g. 5 and 6. One is "pickle 5" with compression, one without compression. I don't like that as much since it breaks the idea that higher protocol numbers are "better". Regards, Neil ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
On 25.05.2018 20:36, Raymond Hettinger wrote: On May 24, 2018, at 10:57 AM, Antoine Pitrou wrote: While PEP 574 (pickle protocol 5 with out-of-band data) is still in draft status, I've made available an implementation in branch "pickle5" in my GitHub fork of CPython: https://github.com/pitrou/cpython/tree/pickle5 Also I've published an experimental backport on PyPI, for Python 3.6 and 3.7. This should help people play with the new API and features without having to compile Python: https://pypi.org/project/pickle5/ Any feedback is welcome. Thanks for doing this. Hope it isn't too late, but I would like to suggest that protocol 5 support fast compression by default. We normally pickle objects so that they can be transported (saved to a file or sent over a socket). Transport costs (reading and writing a file or socket) are generally proportional to size, so compression is likely to be a net win (much as it was for header compression in HTTP/2). The PEP lists compression as a possible a refinement only for large objects, but I expect is will be a win for most pickles to compress them in their entirety. I would advise against that. Pickle format is unreadable as it is, compression will make it literally impossible to diagnose problems. Python supports transparent compression, e.g. with the 'zlib' codec. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/vano%40mail.mipt.ru -- Regards, Ivan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
On Fri, 25 May 2018 10:36:08 -0700 Raymond Hettinger wrote: > > On May 24, 2018, at 10:57 AM, Antoine Pitrou wrote: > > > > While PEP 574 (pickle protocol 5 with out-of-band data) is still in > > draft status, I've made available an implementation in branch "pickle5" > > in my GitHub fork of CPython: > > https://github.com/pitrou/cpython/tree/pickle5 > > > > Also I've published an experimental backport on PyPI, for Python 3.6 > > and 3.7. This should help people play with the new API and features > > without having to compile Python: > > https://pypi.org/project/pickle5/ > > > > Any feedback is welcome. > > Thanks for doing this. > > Hope it isn't too late, but I would like to suggest that protocol 5 support > fast compression by default. We normally pickle objects so that they can be > transported (saved to a file or sent over a socket). Transport costs (reading > and writing a file or socket) are generally proportional to size, so > compression is likely to be a net win (much as it was for header compression > in HTTP/2). > > The PEP lists compression as a possible a refinement only for large objects, > but I expect is will be a win for most pickles to compress them in their > entirety. It's not too late (the PEP is still a draft, and there's a lot of time before 3.8), but I wonder what would be the benefit of making it a part of the pickle specification, rather than compressing independently. Whether and how to compress is generally a compromise between transmission (or storage) speed and computation speed. Also, there are specialized compressors for higher efficiency (for example, Blosc has datatype-specific compression for Numpy arrays). Such knowledge can be embodied in domain-specific libraries such as Dask/distributed, but it cannot really be incorporated in pickle itself. Do you have something specific in mind? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
> On May 24, 2018, at 10:57 AM, Antoine Pitrou wrote: > > While PEP 574 (pickle protocol 5 with out-of-band data) is still in > draft status, I've made available an implementation in branch "pickle5" > in my GitHub fork of CPython: > https://github.com/pitrou/cpython/tree/pickle5 > > Also I've published an experimental backport on PyPI, for Python 3.6 > and 3.7. This should help people play with the new API and features > without having to compile Python: > https://pypi.org/project/pickle5/ > > Any feedback is welcome. Thanks for doing this. Hope it isn't too late, but I would like to suggest that protocol 5 support fast compression by default. We normally pickle objects so that they can be transported (saved to a file or sent over a socket). Transport costs (reading and writing a file or socket) are generally proportional to size, so compression is likely to be a net win (much as it was for header compression in HTTP/2). The PEP lists compression as a possible a refinement only for large objects, but I expect is will be a win for most pickles to compress them in their entirety. Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
I tried this implementation to add no-copy pickling for large numpy arrays and seems to work as expected (for a simple contiguous array). I took some notes on the numpy tracker to advertise this PEP to the numpy developers: https://github.com/numpy/numpy/issues/11161 -- Olivier ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
Link to the PEP: "PEP 574 -- Pickle protocol 5 with out-of-band data" https://www.python.org/dev/peps/pep-0574/ Victor 2018-05-24 19:57 GMT+02:00 Antoine Pitrou : > > Hi, > > While PEP 574 (pickle protocol 5 with out-of-band data) is still in > draft status, I've made available an implementation in branch "pickle5" > in my GitHub fork of CPython: > https://github.com/pitrou/cpython/tree/pickle5 > > Also I've published an experimental backport on PyPI, for Python 3.6 > and 3.7. This should help people play with the new API and features > without having to compile Python: > https://pypi.org/project/pickle5/ > > Any feedback is welcome. > > Regards > > Antoine. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 574 (pickle 5) implementation and backport available
Hi, While PEP 574 (pickle protocol 5 with out-of-band data) is still in draft status, I've made available an implementation in branch "pickle5" in my GitHub fork of CPython: https://github.com/pitrou/cpython/tree/pickle5 Also I've published an experimental backport on PyPI, for Python 3.6 and 3.7. This should help people play with the new API and features without having to compile Python: https://pypi.org/project/pickle5/ Any feedback is welcome. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com