jorisvandenbossche commented on a change in pull request #11455:
URL: https://github.com/apache/arrow/pull/11455#discussion_r733394531
##########
File path: python/pyarrow/parquet.py
##########
@@ -687,7 +687,50 @@ def __exit__(self, *args, **kwargs):
# return false since we want to propagate exceptions
return False
+ def write(self, table_or_batch, row_group_size=None):
+ """
+ Write RecordBatch or Table to the Parquet file.
+
+ Parameters
+ ----------
+ table_or_batch : {RecordBatch, Table}
+ row_group_size : int, default None
+ Maximum size of each written row group. If None, the
+ row group size will be the same size as the input
+ table or batch.
Review comment:
There is actually a maximum size for the row group (if your table or
recordbatch is really large):
https://github.com/apache/arrow/blob/542e81b6dea62f90817b117b1cb1b2de953f293e/cpp/src/parquet/properties.h#L97
Which corresponds to ~67 million rows. Not sure if this is worth mentioning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]