[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12811: ARROW-16122: [Python] Deprecate no-longer supported keywords in parquet.write_to_dataset

GitBox Fri, 15 Apr 2022 00:01:30 -0700


jorisvandenbossche commented on code in PR #12811:
URL: https://github.com/apache/arrow/pull/12811#discussion_r851085571



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -3082,6 +3155,11 @@ def file_visitor(written_file):
             _mkdir_if_not_exists(fs, '/'.join([root_path, subdir]))
             if partition_filename_cb:
                 outfile = partition_filename_cb(keys)
+
+                # raise for unsupported keywords
+                warnings.warn(
+                    _DEPR_MSG.format("partition_filename_cb", msg3),
+                    FutureWarning, stacklevel=2)

Review Comment:
   Since this is in a for loop, we would raise the warning multiple times. I 
think you can check for the keyword and raise the warning once before the 
actual writing code (eg where you now define the `msg3`)



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -2962,11 +2965,48 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
         and allow you to override the partition filename. If nothing is
         passed, the filename will consist of a uuid.
     use_legacy_dataset : bool
-        Default is True unless a ``pyarrow.fs`` filesystem is passed.
-        Set to False to enable the new code path (experimental, using the
-        new Arrow Dataset API). This is more efficient when using partition
-        columns, but does not (yet) support `partition_filename_cb` and
-        `metadata_collector` keywords.
+        Default is False. Set to True to use the the legacy behaviour
+        (this option is deprecated, and the legacy implementation will be
+        removed in a future version). The legacy implementation still
+        supports `partition_filename_cb` and `metadata_collector` keywords
+        but is less efficient when using partition columns.
+    format : FileFormat or str

Review Comment:
   I would leave out the `format`, since this should always be "parquet" (so we 
can hardcode that value). We don't want that users can use 
`pyarrow.parquet.write_to_dataset` to write eg Feather files



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -2962,11 +2965,48 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
         and allow you to override the partition filename. If nothing is
         passed, the filename will consist of a uuid.
     use_legacy_dataset : bool
-        Default is True unless a ``pyarrow.fs`` filesystem is passed.
-        Set to False to enable the new code path (experimental, using the
-        new Arrow Dataset API). This is more efficient when using partition
-        columns, but does not (yet) support `partition_filename_cb` and
-        `metadata_collector` keywords.
+        Default is False. Set to True to use the the legacy behaviour
+        (this option is deprecated, and the legacy implementation will be
+        removed in a future version). The legacy implementation still
+        supports `partition_filename_cb` and `metadata_collector` keywords
+        but is less efficient when using partition columns.
+    format : FileFormat or str

Review Comment:
   And it is actually still hardcoded in the code below, so passing this 
keyword currently wouldn't have any effect. Or is it only added to raise a 
warning in the legacy case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12811: ARROW-16122: [Python] Deprecate no-longer supported keywords in parquet.write_to_dataset

Reply via email to