[GitHub] [arrow] lidavidm commented on a change in pull request #12106: ARROW-13269: Improve metadata docs for partitioned datasets

GitBox Wed, 12 Jan 2022 05:14:20 -0800


lidavidm commented on a change in pull request #12106:
URL: https://github.com/apache/arrow/pull/12106#discussion_r783062792




##########
File path: python/pyarrow/parquet.py
##########
@@ -2305,6 +2305,9 @@ def write_metadata(schema, where, 
metadata_collector=None, **kwargs):
     ...     table.schema, root_path / '_common_metadata', **writer_kwargs)
 
     Write the `_metadata` parquet file with row groups statistics.
+    
+    Note: Partition columns should be removed from the table schema before
+    writing `_metadata` for partitioned datasets.

Review comment:
       Thanks for this. 
   
   Just a couple things:
   
   1) This shouldn't go between the code example and the explanation of the 
code sample, since it interrupts the flow.
   2) Maybe this can go in its own Notes section below? 
https://numpydoc.readthedocs.io/en/latest/format.html#notes 
       Or alternatively, there should be another example that demonstrates 1) 
writing a dataset with partition columns and 2) removing those columns before 
writing metadata.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on a change in pull request #12106: ARROW-13269: Improve metadata docs for partitioned datasets

Reply via email to