[GitHub] [arrow] westonpace commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

GitBox Thu, 26 Aug 2021 11:55:16 -0700


westonpace commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r696898037




##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1998,6 +1998,41 @@ cdef class PartitioningFactory(_Weakrefable):
     cdef inline shared_ptr[CPartitioningFactory] unwrap(self):
         return self.wrapped
 
+    @property
+    def type_name(self):
+        return frombytes(self.factory.type_name())
+
+    def create_with_schema(self, schema):

Review comment:
       Well...in the C++ it is a multi-step method.  The PartitioningFactory is 
created, filenames are inspected, then it is finished.  Thinking about this 
more I am wondering if this is the correct approach.  It seems very odd that a 
partitioning factory should need to be used if you aren't actually inspecting 
any files.  The purpose of a partitioning factory to create a partitioning from 
a set of filenames while creating a dataset from a list of filenames.  So the 
use case is...
   
   Create partitioning factory
   Run inspect on a datasetfactory
   Dataset factory passes all filenames to partitioning factory (while also 
keeping them to create the dataset)
   Finish called on partitioning factory to generate partitioning (which is 
then added to the created dataset)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

Reply via email to