jorisvandenbossche commented on code in PR #14574:
URL: https://github.com/apache/arrow/pull/14574#discussion_r1017705626


##########
python/pyarrow/parquet/core.py:
##########
@@ -3424,16 +3426,20 @@ def write_metadata(schema, where, 
metadata_collector=None, **kwargs):
     ...     table.schema, 'dataset_metadata/_metadata',
     ...     metadata_collector=metadata_collector)
     """
-    writer = ParquetWriter(where, schema, **kwargs)
+    writer = ParquetWriter(where, schema, filesystem, **kwargs)
     writer.close()
 
     if metadata_collector is not None:
         # ParquetWriter doesn't expose the metadata until it's written. Write
         # it and read it again.
-        metadata = read_metadata(where)
+        metadata = read_metadata(where, filesystem=filesystem)
         for m in metadata_collector:
             metadata.append_row_groups(m)
-        metadata.write_metadata_file(where)
+        if filesystem is not None and not hasattr(where, "write"):

Review Comment:
   ```suggestion
           if filesystem is not None:
   ```
   
   I would think we already ran into an error with `ParquetWriter` / 
`read_metadata` if the user would pass a filesystem in combination with a 
file-like object.



##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -1300,6 +1301,32 @@ def _test_write_to_dataset_with_partitions(base_path,
     tm.assert_frame_equal(output_df, input_df)
 
 
+def test_write_metadata_with_without_filesystem(tempdir):

Review Comment:
   A `write_metadata` specific test is in `parquet/test_metadata.py`, so maybe 
move this there?



##########
python/pyarrow/parquet/core.py:
##########
@@ -3424,16 +3426,20 @@ def write_metadata(schema, where, 
metadata_collector=None, **kwargs):
     ...     table.schema, 'dataset_metadata/_metadata',
     ...     metadata_collector=metadata_collector)
     """
-    writer = ParquetWriter(where, schema, **kwargs)
+    writer = ParquetWriter(where, schema, filesystem, **kwargs)
     writer.close()
 
     if metadata_collector is not None:
         # ParquetWriter doesn't expose the metadata until it's written. Write
         # it and read it again.
-        metadata = read_metadata(where)
+        metadata = read_metadata(where, filesystem=filesystem)
         for m in metadata_collector:
             metadata.append_row_groups(m)
-        metadata.write_metadata_file(where)
+        if filesystem is not None and not hasattr(where, "write"):

Review Comment:
   But so actually, `read_metadata` does this by calling `filesystem, where = 
_resolve_filesystem_and_path(where, filesystem)`
   
   We should probably do that here as well, so that the capabilities are 
consistent. For example that also supports writing to a URI string (you can 
test this with a local filesystem using `Path.as_uri()` to get a files:// URI)



##########
python/pyarrow/parquet/core.py:
##########
@@ -3392,6 +3392,8 @@ def write_metadata(schema, where, 
metadata_collector=None, **kwargs):
     where : string or pyarrow.NativeFile
     metadata_collector : list
         where to collect metadata information.
+    filesystem : FileSystem, default None
+        If passed, will be used in subsequent calls accepting a filesystem.

Review Comment:
   That it is being passed to subsequent calls is a bit of an implementation 
detail, I would say. ParquetWriter has this:
   
   ```
   filesystem : FileSystem, default None
       If nothing passed, will be inferred from `where` if path-like, else
       `where` is already a file-like object so no filesystem is needed.
   ```
   
   so maybe let's copy that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to