sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r859106787
########## cpp/src/arrow/dataset/discovery.cc: ########## @@ -278,8 +278,13 @@ Result<std::shared_ptr<Dataset>> FileSystemDatasetFactory::Finish(FinishOptions } std::vector<std::shared_ptr<FileFragment>> fragments; + std::string fixed_path; for (const auto& info : files_) { - auto fixed_path = StripPrefixAndFilename(info.path(), options_.partition_base_dir); + if (partitioning->type_name() == "filename") { + fixed_path = StripPrefix(info.path(), options_.partition_base_dir); + } else { + fixed_path = StripPrefixAndFilename(info.path(), options_.partition_base_dir); + } Review Comment: With the latest change, I modified the `StripPrefixAndFilename()` method to return a `PartitionPathFormat` object which will contain both the directory and filename prefix and then passing that to the `Parse()` method which now expects both the directory and filename-prefix. We can modify the `Parse()` method as well to accept an object of `PartitionPathFormat` that way it will be symmetrical to the `Format()` method. But then, we need to implement similar changes to PyArrow, and I believe then to use the `partitioning.parse()` method in PyArrow we have to define an object of the `PartitionPathFormat` first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org