[ https://issues.apache.org/jira/browse/ARROW-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney closed ARROW-3915. ------------------------------- Resolution: Not A Problem > [Python] Support partition columns when incrementally writing > ------------------------------------------------------------- > > Key: ARROW-3915 > URL: https://issues.apache.org/jira/browse/ARROW-3915 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Affects Versions: 0.11.1 > Reporter: Mackenzie > Priority: Major > Labels: parquet > > Currently `partition_cols` support in pyarrow is implemented in: > [https://github.com/apache/arrow/blob/69d207ff446c76f78fe27b960e7ebe89a607d992/python/pyarrow/parquet.py#L1205-L1235.] > However, there is no way to easily do column partitioning when writing > datasets incrementally via `ParquetWriter`. It would be very helpful if the > column partitioning logic was made more modular and re-used in > `ParquetWriter`. > One option would be to support the `partition_cols` keyword argument in > `ParquetWriter.write_table`. However, this would introduce the potential to > have inconsistent partition columns in subsequent files. Perhaps the better > approach would be to pass as a kwarg when constructing `ParquetWriter` and > manage it as a property whose setter would throw an error if attempting to > set while the writer is open. -- This message was sent by Atlassian JIRA (v7.6.3#76005)