raulcd commented on code in PR #46992:
URL: https://github.com/apache/arrow/pull/46992#discussion_r2194378949


##########
cpp/src/parquet/statistics_test.cc:
##########
@@ -1598,31 +1648,108 @@ TEST(TestStatisticsSortOrderMinMax, Unsigned) {
   ASSERT_EQ(12, stats->num_values());
   ASSERT_EQ(0x00, stats->EncodeMin()[0]);
   ASSERT_EQ(0x0b, stats->EncodeMax()[0]);
+  std::shared_ptr<EncodedStatistics> enc_stats = 
column_chunk->encoded_statistics();
+  ASSERT_FALSE(enc_stats->is_max_value_exact.has_value());
+  ASSERT_FALSE(enc_stats->is_min_value_exact.has_value());
+}
+
+// Test statistics for binary column with truncated max and min values
+TEST(TestStatisticsTruncatedMinMax, Unsigned) {
+  std::string dir_string(test::get_data_dir());
+  std::stringstream ss;
+  ss << dir_string << "/binary_truncated_min_max.parquet";
+  auto path = ss.str();
+
+  // The file is generated by parquet-rs 55.1.0. It
+  // contains six columns of utf-8 and binary type. statistics_truncate_length
+  // is set to 2. Columns 0 and 1 will have truncation of min and max value,
+  // columns 2 and 3 will have truncation of min value only.
+  // Columns 4 and 5 will have no truncation where is_min_value_exact and
+  // is_max_value_exact are set to true.
+  // Column 0 utf-8:  Min: Alice Johnson, Max: Kevin Bacon
+  // Column 1 binary: Min: Alice Johnson, Max: Kevin Bacon
+  // Column 2 utf-8:  Min: Alice Johnson, Max: 🚀Kevin Bacon
+  // Column 3 binary: Min: Alice Johnson, Max: 0xFFFF

Review Comment:
   Yes, this was confusing. Sorry for that, I wanted to show what where the 
original values where the min/max where computed from but this wasn't clear. 
I've opted to remove that. Keep the summary and put the link to the 
documentation for the file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to