tustvold opened a new pull request #1214:
URL: https://github.com/apache/arrow-rs/pull/1214
# Which issue does this PR close?
Closes #1211 .
# Rationale for this change
See ticket
# What changes are included in this PR?
Changes `ArrowWriter` to produce row groups with max_row_group_size rows
except for the final row group in the file.
# Are there any user-facing changes?
Yes, `ArrowWriter` will now buffer up data prior to flush, producing larger
batches in the process. This could be made an opt-in change, but I think this
is probably what a lot of people, myself included, thought the writer did.
On a related note, I think the default max row group size is a tad high
given it is used as a row threshold and not a bytes threshold - I've created
#1213 to track this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]