adriangb commented on code in PR #6319:
URL: https://github.com/apache/arrow-rs/pull/6319#discussion_r1735065816


##########
parquet/src/file/metadata/writer.rs:
##########
@@ -610,6 +649,29 @@ mod tests {
             decoded_metadata.num_row_groups()
         );
 
+        // check that the mins and maxes are what we expect for each page
+        // also indirectly checking that the pages were written out as we 
expected them to be laid out
+        // (if they're not, or something gets refactored in the future that 
breaks that assumption,
+        // this test may have to drop down to a lower level and create 
metadata directly instead of relying on
+        // writing an entire file)
+        let column_indexes = metadata.metadata.column_index().unwrap();
+        assert_eq!(column_indexes.len(), 6);
+        // make sure each row group has 2 pages by checking the first column
+        // page counts for each column for each row group, should all be the 
same and there should be
+        // 12 pages in total across 6 row groups / 1 column
+        let mut page_counts = vec![];
+        for row_group in column_indexes {
+            for column in row_group {
+                match column {
+                    Index::INT32(column_index) => {
+                        page_counts.push(column_index.indexes.len());
+                    }
+                    _ => panic!("unexpected column index type"),
+                }
+            }
+        }
+        assert_eq!(page_counts, vec![2; 6]);

Review Comment:
   Unless we refactor this test to create the metadata directly I think this is 
a good idea. Previously there was no assertion that we were creating more than 
1 page or anything of the sort, since this is testing code that evidently 
diverges in code paths depending on how many pages there are, etc. it's good to 
at least have assertions that the data is laid out as we expect it to be.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to