jorisvandenbossche commented on code in PR #36550:
URL: https://github.com/apache/arrow/pull/36550#discussion_r1256052302


##########
python/pyarrow/_dataset.pyx:
##########
@@ -2988,7 +3017,7 @@ cdef class FileSystemFactoryOptions(_Weakrefable):
         c_factory = self.options.partitioning.factory()
         if c_factory.get() == nullptr:
             return None
-        return PartitioningFactory.wrap(c_factory)
+        return PartitioningFactory.wrap(c_factory, None, None)

Review Comment:
   Yes, that's correct. 
   
   But this is only if you first create an options object with a factory, and 
then decide the pickle the factory accessed from the options object instead of 
the factory you created to start with:
   
   ```python
   partitioning_factory = ds.HivePartitioning.discover(..)
   options = ds.FileSystemFactoryOptions(partitioning=partitioning_factory)
   pickle.dumps(options.partitioning_factory)  # instead of 
pickle.dumps(partitioning_factory)
   ```
   
   So in practice, if you run into this, you can always pickle the original 
factory object you created (as there is no other way to get such an object than 
creating it yourself, i.e. the options object only has it when you passed it 
manually). 
   
   We _could_ support pickling those options-accessed factory objects (and 
ideally that would also be the case, for consistency / avoiding confusion), but 
the current code only stores the C++ factory object on the options class and 
recreates the python factory object from the C++ object, and then we don't know 
anymore how this C++ object was created. We could add some workaround to also 
store the python factory object, but I am not sure this is worth the effort, 
given that as a user you can always pickle the original object (explanation 
above)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to