On Mon, Jul 2, 2018 at 5:16 PM, Charles R Harris <charlesr.har...@gmail.com> wrote:
> > > On Mon, Jul 2, 2018 at 3:03 PM, Antoine Pitrou <anto...@python.org> wrote: > >> >> Hello, >> >> Some of you might know that I've been working on a PEP in order to >> improve pickling performance of large (or huge) data. The PEP, >> numbered 574 and titled "Pickle protocol 5 with out-of-band data", >> allows participating data types to be pickled without any memory copy. >> https://www.python.org/dev/peps/pep-0574/ >> >> The PEP already has an implementation, which is backported as an >> independent PyPI package under the name "pickle5". >> https://pypi.org/project/pickle5/ >> >> I also have a working patch updating PyArrow to use the PEP-defined >> extensions to allow for zero-copy pickling of Arrow arrays - without >> breaking compatibility with existing usage: >> https://github.com/apache/arrow/pull/2161 >> >> Still, it is obvious one the primary targets of PEP 574 is Numpy >> arrays, as the most prevalent datatype in the Python scientific >> ecosystem. I'm personally satisfied with the current state of the PEP, >> but I'd like to have feedback from Numpy core maintainers. I haven't >> tried (yet?) to draft a Numpy patch to add PEP 574 support, since that's >> likely to be more involved due to the complexity of Numpy and due to >> the core being written in C. Therefore I would like some help >> evaluating whether the PEP is likely to be a good fit for Numpy. >> >> > Maybe somewhat off topic, but we have had trouble with a 2 GiB limit on > file writes on OS X. See https://github.com/numpy/numpy/issues/3858. Does > your implementation work around that? > ISTR that some parallel processing applications sent pickled arrays around to different processes, I don't know if that is still the case, but if so, no copy might be a big gain for them. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion