mapleFU commented on code in PR #36377:
URL: https://github.com/apache/arrow/pull/36377#discussion_r1251854179
##########
cpp/src/parquet/arrow/writer.h:
##########
@@ -91,10 +91,21 @@ class PARQUET_EXPORT FileWriter {
/// \brief Write a Table to Parquet.
///
+ /// If `use_buffering` is false, then any pending row group is closed
+ /// at the beginning and at the end of this call.
+ /// If `use_buffering` is true, this function reuses an existing
+ /// buffered row group until the chunk size is met, and leaves
+ /// the last row group open for further writes.
+ /// It is recommended to set `use_buffering` to true to minimize
+ /// the number of row groups, especially when calling `WriteTable`
+ /// with small tables.
+ ///
/// \param table Arrow table to write.
/// \param chunk_size maximum number of rows to write per row group.
- virtual ::arrow::Status WriteTable(
- const ::arrow::Table& table, int64_t chunk_size =
DEFAULT_MAX_ROW_GROUP_LENGTH) = 0;
+ /// \param use_buffering Whether to potentially buffer data.
+ virtual ::arrow::Status WriteTable(const ::arrow::Table& table,
Review Comment:
Yes we can. However, this is mentioned here:
https://github.com/apache/arrow/pull/36286#issuecomment-1611332862
And I think that `RecordBatch` reuses Array, `Table` may uses
`ChunkedArray`, so supporting a `WriteTable` with buffered is ok here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]