On Mon, Oct 9, 2023 at 12:57 AM Aaron Meurer <asmeu...@gmail.com> wrote:
> Is it possible to convert a NumPy 1 pickle file into a generic pickle > file that works in both NumPy 1 and 2? As far as I understand, pickle > is Turing complete, so I imagine it should be theoretically possible, > but I don't know how easy it would be to actually do this or how it > would affect the pickle file size. > Hi Aaron, The issue is that the pickle protocol needs a reference to a reconstructor to recreate numpy types. For ndarray, that function is currently `numpy.core.multiarray._reconstruct` and in numpy 2 becomes numpy._core.multiarray.reconstruct. For a pickle file containing only an ndarray, this is the first thing in the pickle file and the import happens inside of the pickle implementation. I am not aware of a hook that Python gives us to intercept that path before Python imports it. So, even if there is a way to correct subsequent paths in the pickle file, we won't be able to fix the most problematic path that will occur in any pickle that includes a numpy array. That means some user-visible pain no matter what. If we can't avoid that, I'd prefer to offer a solution that will allow people to continue loading old pickle files indefinitely (albeit with a minor code change). This also gives us a place to put compatibility fixes for future changes that impact old pickle files. -Nathan > > Aaron Meurer > > On Fri, Oct 6, 2023 at 10:17 AM Nathan <nathan.goldb...@gmail.com> wrote: > > > > Hi all, > > > > As part of the ongoing work on NEP 52 we are getting close to merging > the pull request that changes numpy.core to numpy._core. > > > > While working on this we realized that numpy pickle files include paths > to np.core in the pickle data. If we do nothing, switching np.core to > np._core will generate deprecation warnings when loading pickle files > generated by Numpy 1.x in Numpy 2.x and Numpy 1.x will be unable to read > Numpy 2.x pickle files. Eventually, when Numpy 2.x completely removes the > stub np.core module, loading old pickle files will break. > > > > The fix we have come up with is to add a new public NumpyUnpickler class > to both the main branch and the Numpy 1.26 maintenance branch. This allows > loading pickle files that were generated by Numpy 1.x and 2.x in either > version without any warnings or errors. Users who are loading old pickle > files will need to update their code to use NumpyUnpickler or create new > pickle files and users who generate pickles with numpy 2.x will need to use > NumpyUnpickler to read the files in numpy 1.x. > > > > We are using NumpyUnpickler internally for loading files in the npy file > format. Users with pickle data saved in npy files won't see warnings. Only > users who are storing data in pickle files directly and who want pickle > files written in one numpy version to load correctly in another numpy > version will run into trouble. The I/O docs already explicitly discourage > using pickles to share data files between people and organizations like > this. > > > > An alternate approach which would require less work for users would be > to leave a limited subset of functionality in `np.core` needed for loading > pickle files undeprecated. We would prefer to avoid doing this both because > it would leave behind a publicly visible `np.core` module in NumPy's public > API and because we're not sure if we can come up with a complete set of > imports that should be allowed without warning from `np.core` without > missing some corner cases and users will see deprecation warnings when > loading pickles anyway. > > > > See https://github.com/numpy/numpy/pull/24866, > https://github.com/numpy/numpy/issues/24844, and the discussion in > https://github.com/numpy/numpy/pull/24634 for more context. > > > > -Nathan > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: asmeu...@gmail.com > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: nathan12...@gmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com