pitrou commented on code in PR #46992: URL: https://github.com/apache/arrow/pull/46992#discussion_r2192902703
########## cpp/src/parquet/statistics_test.cc: ########## @@ -1598,31 +1648,108 @@ TEST(TestStatisticsSortOrderMinMax, Unsigned) { ASSERT_EQ(12, stats->num_values()); ASSERT_EQ(0x00, stats->EncodeMin()[0]); ASSERT_EQ(0x0b, stats->EncodeMax()[0]); + std::shared_ptr<EncodedStatistics> enc_stats = column_chunk->encoded_statistics(); + ASSERT_FALSE(enc_stats->is_max_value_exact.has_value()); + ASSERT_FALSE(enc_stats->is_min_value_exact.has_value()); +} + +// Test statistics for binary column with truncated max and min values +TEST(TestStatisticsTruncatedMinMax, Unsigned) { + std::string dir_string(test::get_data_dir()); + std::stringstream ss; + ss << dir_string << "/binary_truncated_min_max.parquet"; + auto path = ss.str(); + + // The file is generated by parquet-rs 55.1.0. It + // contains six columns of utf-8 and binary type. statistics_truncate_length + // is set to 2. Columns 0 and 1 will have truncation of min and max value, + // columns 2 and 3 will have truncation of min value only. + // Columns 4 and 5 will have no truncation where is_min_value_exact and + // is_max_value_exact are set to true. + // Column 0 utf-8: Min: Alice Johnson, Max: Kevin Bacon + // Column 1 binary: Min: Alice Johnson, Max: Kevin Bacon + // Column 2 utf-8: Min: Alice Johnson, Max: 🚀Kevin Bacon + // Column 3 binary: Min: Alice Johnson, Max: 0xFFFF Review Comment: Neither min is truncated here, but column 3's max is truncated (should it be?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org