alamb commented on code in PR #9353:
URL: https://github.com/apache/arrow-rs/pull/9353#discussion_r2766324474
##########
parquet/src/file/properties.rs:
##########
@@ -534,6 +547,9 @@ impl WriterPropertiesBuilder {
///
/// Note: this is a best effort limit based on value of
/// [`set_write_batch_size`](Self::set_write_batch_size).
+ ///
Review Comment:
thank you for the nice comments
##########
parquet/src/column/writer/mod.rs:
##########
@@ -2502,6 +2503,27 @@ mod tests {
);
}
+ #[test]
+ fn test_column_writer_column_data_page_size_limit() {
Review Comment:
I would personally prefer an even higher level test instead of this one --
namely write a parquet file with two columns where the data page limit of one
column is set to something artificially low and then verify that the page
layouts are different
Perhaps in
https://github.com/apache/arrow-rs/blob/cfb9807646eaa3e577b406fcfe18312960a59f99/parquet/tests/arrow_writer_layout.rs#L33-L32
(though it might need to be extended for multiple columns)
##########
parquet/src/column/writer/mod.rs:
##########
@@ -2502,6 +2503,27 @@ mod tests {
);
}
+ #[test]
+ fn test_column_writer_column_data_page_size_limit() {
+ let props = Arc::new(
+ WriterProperties::builder()
+ .set_writer_version(WriterVersion::PARQUET_1_0)
+ .set_dictionary_enabled(false)
+ .set_data_page_size_limit(1000)
+ .set_column_data_page_size_limit(ColumnPath::from("col"), 10)
+ .set_write_batch_size(3)
+ .build(),
+ );
+ let data = &[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
+
+ let col_values =
+ write_and_collect_page_values(ColumnPath::from("col"),
Arc::clone(&props), data);
+ let other_values =
write_and_collect_page_values(ColumnPath::from("other"), props, data);
+
+ assert_eq!(col_values, vec![3, 3, 3, 1]);
+ assert_eq!(other_values, vec![10]);
Review Comment:
I tested without this code change and the tests fails with
```
---- column::writer::tests::test_column_writer_column_data_page_size_limit
stdout ----
thread
'column::writer::tests::test_column_writer_column_data_page_size_limit'
(23590832) panicked at parquet/src/column/writer/
mod.rs:2522:9:
assertion `left == right` failed
left: [10]
right: [3, 3, 3, 1]
note: run with `RUST_BACKTRACE=1` environment variable to display a
backtrace
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]