On Mon, Oct 9, 2023 at 12:57 AM Aaron Meurer <asmeu...@gmail.com> wrote:

> Is it possible to convert a NumPy 1 pickle file into a generic pickle
> file that works in both NumPy 1 and 2? As far as I understand, pickle
> is Turing complete, so I imagine it should be theoretically possible,
> but I don't know how easy it would be to actually do this or how it
> would affect the pickle file size.
>

Hi Aaron,

The issue is that the pickle protocol needs a reference to a reconstructor
to recreate numpy types. For ndarray, that function is currently
`numpy.core.multiarray._reconstruct` and in numpy 2 becomes
numpy._core.multiarray.reconstruct. For a pickle file containing only an
ndarray, this is the first thing in the pickle file and the import happens
inside of the pickle implementation. I am not aware of a hook that Python
gives us to intercept that path before Python imports it.

So, even if there is a way to correct subsequent paths in the pickle file,
we won't be able to fix the most problematic path that will occur in any
pickle that includes a numpy array. That means some user-visible pain no
matter what. If we can't avoid that, I'd prefer to offer a solution that
will allow people to continue loading old pickle files indefinitely (albeit
with a minor code change). This also gives us a place to put compatibility
fixes for future changes that impact old pickle files.

-Nathan



>
> Aaron Meurer
>
> On Fri, Oct 6, 2023 at 10:17 AM Nathan <nathan.goldb...@gmail.com> wrote:
> >
> > Hi all,
> >
> > As part of the ongoing work on NEP 52 we are getting close to merging
> the pull request that changes numpy.core to numpy._core.
> >
> > While working on this we realized that numpy pickle files include paths
> to np.core in the pickle data. If we do nothing, switching np.core to
> np._core will generate deprecation warnings when loading pickle files
> generated by Numpy 1.x in Numpy 2.x and Numpy 1.x will be unable to read
> Numpy 2.x pickle files. Eventually, when Numpy 2.x completely removes the
> stub np.core module, loading old pickle files will break.
> >
> > The fix we have come up with is to add a new public NumpyUnpickler class
> to both the main branch and the Numpy 1.26 maintenance branch. This allows
> loading pickle files that were generated by Numpy 1.x and 2.x in either
> version without any warnings or errors. Users who are loading old pickle
> files will need to update their code to use NumpyUnpickler or create new
> pickle files and users who generate pickles with numpy 2.x will need to use
> NumpyUnpickler to read the files in numpy 1.x.
> >
> > We are using NumpyUnpickler internally for loading files in the npy file
> format. Users with pickle data saved in npy files won't see warnings. Only
> users who are storing data in pickle files directly and who want pickle
> files written in one numpy version to load correctly in another numpy
> version will run into trouble. The I/O docs already explicitly discourage
> using pickles to share data files between people and organizations like
> this.
> >
> > An alternate approach which would require less work for users would be
> to leave a limited subset of functionality in `np.core` needed for loading
> pickle files undeprecated. We would prefer to avoid doing this both because
> it would leave behind a publicly visible `np.core` module in NumPy's public
> API and because we're not sure if we can come up with a complete set of
> imports that should be allowed without warning from `np.core` without
> missing some corner cases and users will see deprecation warnings when
> loading pickles anyway.
> >
> > See https://github.com/numpy/numpy/pull/24866,
> https://github.com/numpy/numpy/issues/24844, and the discussion in
> https://github.com/numpy/numpy/pull/24634 for more context.
> >
> > -Nathan
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: asmeu...@gmail.com
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to