lidavidm commented on code in PR #12977:
URL: https://github.com/apache/arrow/pull/12977#discussion_r861978402


##########
cpp/src/arrow/dataset/partition.h:
##########
@@ -310,8 +310,8 @@ class ARROW_DS_EXPORT FunctionPartitioning : public 
Partitioning {
 
   std::string type_name() const override { return name_; }
 
-  Result<compute::Expression> Parse(const std::string& path) const override {
-    return parse_impl_(path);
+  Result<compute::Expression> Parse(const PartitionPathFormat& path) const 
override {
+    return parse_impl_(path.directory);

Review Comment:
   Hmm, not for this PR but I would expect FunctionPartitioning to be "as 
powerful as" any other partitioning. But I don't think it's used much anyways.



##########
python/pyarrow/_dataset.pyx:
##########
@@ -1313,9 +1313,12 @@ cdef class Partitioning(_Weakrefable):
     cdef inline shared_ptr[CPartitioning] unwrap(self):
         return self.wrapped
 
-    def parse(self, path):
+    def parse(self, directory="", prefix=""):

Review Comment:
   nit but do both of these need a default? I can see prefix having a default 
because it's a new argument



##########
python/pyarrow/_dataset.pyx:
##########
@@ -1313,9 +1313,12 @@ cdef class Partitioning(_Weakrefable):
     cdef inline shared_ptr[CPartitioning] unwrap(self):
         return self.wrapped
 
-    def parse(self, path):
+    def parse(self, directory="", prefix=""):

Review Comment:
   Also IMO the parameter should be named "filename", and/or we should have a 
docstring



##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -873,7 +874,8 @@ Result<std::vector<std::shared_ptr<Schema>>> 
ParquetDatasetFactory::InspectSchem
 
     size_t i = 0;
     for (const auto& e : paths_with_row_group_ids_) {
-      stripped[i++] = StripPrefixAndFilename(e.first, 
options_.partition_base_dir);
+      stripped[i++] =
+          StripPrefixAndFilename(e.first, 
options_.partition_base_dir).directory;

Review Comment:
   This still seems off, but I can't figure out how to hit this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to