alamb commented on code in PR #7574:
URL: https://github.com/apache/arrow-rs/pull/7574#discussion_r2121421343


##########
parquet/tests/arrow_reader/mod.rs:
##########
@@ -1027,11 +1058,15 @@ async fn make_test_file_rg(scenario: Scenario, 
row_per_group: usize) -> NamedTem
         .tempfile()
         .expect("tempfile creation");
 
-    let props = WriterProperties::builder()
+    let mut builder = WriterProperties::builder()
         .set_max_row_group_size(row_per_group)
         .set_bloom_filter_enabled(true)
-        .set_statistics_enabled(EnabledStatistics::Page)
-        .build();
+        .set_statistics_enabled(EnabledStatistics::Page);
+    if matches!(scenario, Scenario::TruncatedUTF8) {

Review Comment:
   Instead of using `matches!` here, could you please add a method to 
`Scenario`, ilke `if `scenario.truncate_stats()`?
   
   That way
   1. There is a clearer place to add the documentation
   2. It is easier to see by looking at `Scenario` that it may truncate the 
stats as well



##########
parquet/tests/arrow_reader/statistics.rs:
##########
@@ -354,7 +376,45 @@ impl Test<'_> {
 //
 // Remaining cases
 //   f64::NAN
-// - Using truncated statistics  ("exact min value" and "exact max value" 
https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html#method.max_is_exact)
+
+#[tokio::test]
+async fn test_max_and_min_value_truncated() {
+    let reader = TestReader {
+        scenario: Scenario::TruncatedUTF8,
+        row_per_group: 5,
+    }
+    .build()
+    .await;
+
+    Test {
+        reader: &reader,
+        // min is truncated to
+        // 1. `"a".repeate(64)`, original value is `"a".repeat(64) + "1"`
+        // 2. "", since there's a null in the second row group
+        // 3. "j"
+        expected_min: Arc::new(StringArray::from(vec![&("a".repeat(64)), "", 
"j"])),

Review Comment:
   I agree I would expect NULL in the second group (for unknown statistics)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to