Hello,

Some of you might know that I've been working on a PEP in order to
improve pickling performance of large (or huge) data.  The PEP,
numbered 574 and titled "Pickle protocol 5 with out-of-band data",
allows participating data types to be pickled without any memory copy.
https://www.python.org/dev/peps/pep-0574/

The PEP already has an implementation, which is backported as an
independent PyPI package under the name "pickle5".
https://pypi.org/project/pickle5/

I also have a working patch updating PyArrow to use the PEP-defined
extensions to allow for zero-copy pickling of Arrow arrays - without
breaking compatibility with existing usage:
https://github.com/apache/arrow/pull/2161

Still, it is obvious one the primary targets of PEP 574 is Numpy
arrays, as the most prevalent datatype in the Python scientific
ecosystem.  I'm personally satisfied with the current state of the PEP,
but I'd like to have feedback from Numpy core maintainers.  I haven't
tried (yet?) to draft a Numpy patch to add PEP 574 support, since that's
likely to be more involved due to the complexity of Numpy and due to
the core being written in C.  Therefore I would like some help
evaluating whether the PEP is likely to be a good fit for Numpy.

Regards

Antoine.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to