Re: [Numpy-discussion] PEP 574 - zero-copy pickling with out of band data

Charles R Harris Mon, 02 Jul 2018 16:31:29 -0700

On Mon, Jul 2, 2018 at 5:16 PM, Charles R Harris <[email protected]>
wrote:


>
>
> On Mon, Jul 2, 2018 at 3:03 PM, Antoine Pitrou <[email protected]> wrote:
>
>>
>> Hello,
>>
>> Some of you might know that I've been working on a PEP in order to
>> improve pickling performance of large (or huge) data.  The PEP,
>> numbered 574 and titled "Pickle protocol 5 with out-of-band data",
>> allows participating data types to be pickled without any memory copy.
>> https://www.python.org/dev/peps/pep-0574/
>>
>> The PEP already has an implementation, which is backported as an
>> independent PyPI package under the name "pickle5".
>> https://pypi.org/project/pickle5/
>>
>> I also have a working patch updating PyArrow to use the PEP-defined
>> extensions to allow for zero-copy pickling of Arrow arrays - without
>> breaking compatibility with existing usage:
>> https://github.com/apache/arrow/pull/2161
>>
>> Still, it is obvious one the primary targets of PEP 574 is Numpy
>> arrays, as the most prevalent datatype in the Python scientific
>> ecosystem.  I'm personally satisfied with the current state of the PEP,
>> but I'd like to have feedback from Numpy core maintainers.  I haven't
>> tried (yet?) to draft a Numpy patch to add PEP 574 support, since that's
>> likely to be more involved due to the complexity of Numpy and due to
>> the core being written in C.  Therefore I would like some help
>> evaluating whether the PEP is likely to be a good fit for Numpy.
>>
>>
> Maybe somewhat off topic, but we have had trouble with a 2 GiB limit on
> file writes on OS X. See https://github.com/numpy/numpy/issues/3858. Does
> your implementation work around that?
>

ISTR that some parallel processing applications sent pickled arrays around
to different processes, I don't know if that is still the case, but if so,
no copy might be a big gain for them.

Chuck

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PEP 574 - zero-copy pickling with out of band data

Reply via email to