raulcd commented on code in PR #46992: URL: https://github.com/apache/arrow/pull/46992#discussion_r2194378949
########## cpp/src/parquet/statistics_test.cc: ########## @@ -1598,31 +1648,108 @@ TEST(TestStatisticsSortOrderMinMax, Unsigned) { ASSERT_EQ(12, stats->num_values()); ASSERT_EQ(0x00, stats->EncodeMin()[0]); ASSERT_EQ(0x0b, stats->EncodeMax()[0]); + std::shared_ptr<EncodedStatistics> enc_stats = column_chunk->encoded_statistics(); + ASSERT_FALSE(enc_stats->is_max_value_exact.has_value()); + ASSERT_FALSE(enc_stats->is_min_value_exact.has_value()); +} + +// Test statistics for binary column with truncated max and min values +TEST(TestStatisticsTruncatedMinMax, Unsigned) { + std::string dir_string(test::get_data_dir()); + std::stringstream ss; + ss << dir_string << "/binary_truncated_min_max.parquet"; + auto path = ss.str(); + + // The file is generated by parquet-rs 55.1.0. It + // contains six columns of utf-8 and binary type. statistics_truncate_length + // is set to 2. Columns 0 and 1 will have truncation of min and max value, + // columns 2 and 3 will have truncation of min value only. + // Columns 4 and 5 will have no truncation where is_min_value_exact and + // is_max_value_exact are set to true. + // Column 0 utf-8: Min: Alice Johnson, Max: Kevin Bacon + // Column 1 binary: Min: Alice Johnson, Max: Kevin Bacon + // Column 2 utf-8: Min: Alice Johnson, Max: 🚀Kevin Bacon + // Column 3 binary: Min: Alice Johnson, Max: 0xFFFF Review Comment: Yes, this was confusing. Sorry for that, I wanted to show what where the original values where the min/max where computed from but this wasn't clear. I've opted to remove that. Keep the summary and put the link to the documentation for the file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org