sanjibansg commented on code in PR #12977:
URL: https://github.com/apache/arrow/pull/12977#discussion_r859106787


##########
cpp/src/arrow/dataset/discovery.cc:
##########
@@ -278,8 +278,13 @@ Result<std::shared_ptr<Dataset>> 
FileSystemDatasetFactory::Finish(FinishOptions
   }
 
   std::vector<std::shared_ptr<FileFragment>> fragments;
+  std::string fixed_path;
   for (const auto& info : files_) {
-    auto fixed_path = StripPrefixAndFilename(info.path(), 
options_.partition_base_dir);
+    if (partitioning->type_name() == "filename") {
+      fixed_path = StripPrefix(info.path(), options_.partition_base_dir);
+    } else {
+      fixed_path = StripPrefixAndFilename(info.path(), 
options_.partition_base_dir);
+    }

Review Comment:
   With the latest change, I modified the `StripPrefixAndFilename()` method to 
return a `PartitionPathFormat` object which will contain both the directory and 
filename prefix and then passing that to the `Parse()` method which now expects 
both the directory and filename-prefix. 
   
   We can modify the `Parse()` method as well to accept an object of 
`PartitionPathFormat` that way it will be symmetrical to the `Format()` method. 
But then, we need to implement similar changes to PyArrow, and I believe then 
to use the `partitioning.parse()` method in PyArrow we have to define an object 
of the `PartitionPathFormat` first. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to