Re: [PR] GH-44765: [Parquet] ParquetDataset should expose partition_base_dir [arrow]

via GitHub Sat, 30 Nov 2024 22:28:21 -0800


kou commented on code in PR #44766:
URL: https://github.com/apache/arrow/pull/44766#discussion_r1864753423



##########
python/pyarrow/parquet/core.py:
##########
@@ -1169,7 +1169,13 @@ def _get_pandas_index_columns(keyvalues):
     assumes directory names with key=value pairs like "/year=2009/month=11".
     In addition, a scheme like "/2009/11" is also supported, in which case
     you need to specify the field names or a full schema. See the
-    ``pyarrow.dataset.partitioning()`` function for more details."""
+    ``pyarrow.dataset.partitioning()`` function for more details.
+partition_base_dir : str, optional
+        For the purposes of applying the partitioning, paths will be
+        stripped of the partition_base_dir. Files not matching the
+        partition_base_dir prefix will be skipped for partitioning discovery.
+        The ignored files will still be part of the Dataset, but will not
+        have partition information."""

Review Comment:
   ```suggestion
       For the purposes of applying the partitioning, paths will be
       stripped of the partition_base_dir. Files not matching the
       partition_base_dir prefix will be skipped for partitioning discovery.
       The ignored files will still be part of the Dataset, but will not
       have partition information."""
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-44765: [Parquet] ParquetDataset should expose partition_base_dir [arrow]

Reply via email to