bkietz commented on code in PR #43726:
URL: https://github.com/apache/arrow/pull/43726#discussion_r1744532207


##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -366,9 +366,16 @@ std::optional<compute::Expression> 
ParquetFileFragment::EvaluateStatisticsAsExpr
     const parquet::Statistics& statistics) {
   auto field_expr = compute::field_ref(field_ref);
 
+  bool may_has_null = !statistics.HasNullCount() || statistics.null_count() > 
0;
+  bool must_has_null = statistics.HasNullCount() && statistics.null_count() > 
0;
   // Optimize for corner case where all values are nulls
-  if (statistics.num_values() == 0 && statistics.null_count() > 0) {
-    return is_null(std::move(field_expr));
+  if (statistics.num_values() == 0) {
+    if (must_has_null) {
+      return is_null(std::move(field_expr));
+    }
+    // If there are no values and no nulls, it might be empty or contains
+    // only null.
+    return std::nullopt;

Review Comment:
   I think it's simpler and more consistent to return 
`is_null(std::move(field_expr))` if we can; in general I think it usually makes 
the most sense to construct the most explicit/precise guarantees which are 
easily available (and therefore to avoid making this special case which will 
result in *less* specific guarantees).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to