jorisvandenbossche commented on code in PR #36550: URL: https://github.com/apache/arrow/pull/36550#discussion_r1256052302
########## python/pyarrow/_dataset.pyx: ########## @@ -2988,7 +3017,7 @@ cdef class FileSystemFactoryOptions(_Weakrefable): c_factory = self.options.partitioning.factory() if c_factory.get() == nullptr: return None - return PartitioningFactory.wrap(c_factory) + return PartitioningFactory.wrap(c_factory, None, None) Review Comment: Yes, that's correct. But this is only if you first create an options object with a factory, and then decide the pickle the factory accessed from the options object instead of the factory you created to start with: ```python partitioning_factory = ds.HivePartitioning.discover(..) options = ds.FileSystemFactoryOptions(partitioning=partitioning_factory) pickle.dumps(options.partitioning_factory) # instead of pickle.dumps(partitioning_factory) ``` So in practice, if you run into this, you can always pickle the original factory object you created (as there is no other way to get such an object than creating it yourself, i.e. the options object only has it when you passed it manually). We _could_ support pickling those options-accessed factory objects (and ideally that would also be the case, for consistency / avoiding confusion), but the current code only stores the C++ factory object on the options class and recreates the python factory object from the C++ object, and then we don't know anymore how this C++ object was created. We could add some workaround to also store the python factory object, but I am not sure this is worth the effort, given that as a user you can always pickle the original object (explanation above) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org