Mackenzie created ARROW-3915: -------------------------------- Summary: [Python] Support partition columns when incrementally writing Key: ARROW-3915 URL: https://issues.apache.org/jira/browse/ARROW-3915 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.11.1 Reporter: Mackenzie
Currently `partition_cols` support in pyarrow is implemented in: [https://github.com/apache/arrow/blob/69d207ff446c76f78fe27b960e7ebe89a607d992/python/pyarrow/parquet.py#L1205-L1235.] However, there is no way to easily do column partitioning when writing datasets incrementally via `ParquetWriter`. It would be very helpful if the column partitioning logic was made more modular and re-used in `ParquetWriter`. One option would be to support the `partition_cols` keyword argument in `ParquetWriter.write_table`. However, this would introduce the potential to have inconsistent partition columns in subsequent files. Perhaps the better approach would be to pass as a kwarg when constructing `ParquetWriter` and manage it as a property whose setter would throw an error if attempting to set while the writer is open. -- This message was sent by Atlassian JIRA (v7.6.3#76005)