alamb commented on code in PR #9353:
URL: https://github.com/apache/arrow-rs/pull/9353#discussion_r2766324474


##########
parquet/src/file/properties.rs:
##########
@@ -534,6 +547,9 @@ impl WriterPropertiesBuilder {
     ///
     /// Note: this is a best effort limit based on value of
     /// [`set_write_batch_size`](Self::set_write_batch_size).
+    ///

Review Comment:
   thank you for the nice comments



##########
parquet/src/column/writer/mod.rs:
##########
@@ -2502,6 +2503,27 @@ mod tests {
         );
     }
 
+    #[test]
+    fn test_column_writer_column_data_page_size_limit() {

Review Comment:
   I would personally prefer an even higher level test instead of this one -- 
namely write a parquet file with two columns where the data page limit of one 
column is set to something artificially low and then verify that the page 
layouts are different
   
   Perhaps in 
https://github.com/apache/arrow-rs/blob/cfb9807646eaa3e577b406fcfe18312960a59f99/parquet/tests/arrow_writer_layout.rs#L33-L32
   
   (though it might need to be extended for multiple columns)



##########
parquet/src/column/writer/mod.rs:
##########
@@ -2502,6 +2503,27 @@ mod tests {
         );
     }
 
+    #[test]
+    fn test_column_writer_column_data_page_size_limit() {
+        let props = Arc::new(
+            WriterProperties::builder()
+                .set_writer_version(WriterVersion::PARQUET_1_0)
+                .set_dictionary_enabled(false)
+                .set_data_page_size_limit(1000)
+                .set_column_data_page_size_limit(ColumnPath::from("col"), 10)
+                .set_write_batch_size(3)
+                .build(),
+        );
+        let data = &[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
+
+        let col_values =
+            write_and_collect_page_values(ColumnPath::from("col"), 
Arc::clone(&props), data);
+        let other_values = 
write_and_collect_page_values(ColumnPath::from("other"), props, data);
+
+        assert_eq!(col_values, vec![3, 3, 3, 1]);
+        assert_eq!(other_values, vec![10]);

Review Comment:
   I tested without this code change and the tests fails with
   
   ```
     ---- column::writer::tests::test_column_writer_column_data_page_size_limit 
stdout ----
   
     thread 
'column::writer::tests::test_column_writer_column_data_page_size_limit' 
(23590832) panicked at parquet/src/column/writer/
     mod.rs:2522:9:
     assertion `left == right` failed
       left: [10]
      right: [3, 3, 3, 1]
     note: run with `RUST_BACKTRACE=1` environment variable to display a 
backtrace
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to