lidavidm commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r861978402
########## cpp/src/arrow/dataset/partition.h: ########## @@ -310,8 +310,8 @@ class ARROW_DS_EXPORT FunctionPartitioning : public Partitioning { std::string type_name() const override { return name_; } - Result<compute::Expression> Parse(const std::string& path) const override { - return parse_impl_(path); + Result<compute::Expression> Parse(const PartitionPathFormat& path) const override { + return parse_impl_(path.directory); Review Comment: Hmm, not for this PR but I would expect FunctionPartitioning to be "as powerful as" any other partitioning. But I don't think it's used much anyways. ########## python/pyarrow/_dataset.pyx: ########## @@ -1313,9 +1313,12 @@ cdef class Partitioning(_Weakrefable): cdef inline shared_ptr[CPartitioning] unwrap(self): return self.wrapped - def parse(self, path): + def parse(self, directory="", prefix=""): Review Comment: nit but do both of these need a default? I can see prefix having a default because it's a new argument ########## python/pyarrow/_dataset.pyx: ########## @@ -1313,9 +1313,12 @@ cdef class Partitioning(_Weakrefable): cdef inline shared_ptr[CPartitioning] unwrap(self): return self.wrapped - def parse(self, path): + def parse(self, directory="", prefix=""): Review Comment: Also IMO the parameter should be named "filename", and/or we should have a docstring ########## cpp/src/arrow/dataset/file_parquet.cc: ########## @@ -873,7 +874,8 @@ Result<std::vector<std::shared_ptr<Schema>>> ParquetDatasetFactory::InspectSchem size_t i = 0; for (const auto& e : paths_with_row_group_ids_) { - stripped[i++] = StripPrefixAndFilename(e.first, options_.partition_base_dir); + stripped[i++] = + StripPrefixAndFilename(e.first, options_.partition_base_dir).directory; Review Comment: This still seems off, but I can't figure out how to hit this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org