bkietz commented on code in PR #43726: URL: https://github.com/apache/arrow/pull/43726#discussion_r1744532207
########## cpp/src/arrow/dataset/file_parquet.cc: ########## @@ -366,9 +366,16 @@ std::optional<compute::Expression> ParquetFileFragment::EvaluateStatisticsAsExpr const parquet::Statistics& statistics) { auto field_expr = compute::field_ref(field_ref); + bool may_has_null = !statistics.HasNullCount() || statistics.null_count() > 0; + bool must_has_null = statistics.HasNullCount() && statistics.null_count() > 0; // Optimize for corner case where all values are nulls - if (statistics.num_values() == 0 && statistics.null_count() > 0) { - return is_null(std::move(field_expr)); + if (statistics.num_values() == 0) { + if (must_has_null) { + return is_null(std::move(field_expr)); + } + // If there are no values and no nulls, it might be empty or contains + // only null. + return std::nullopt; Review Comment: I think it's simpler and more consistent to return `is_null(std::move(field_expr))` if we can; in general I think it usually makes the most sense to construct the most explicit/precise guarantees which are easily available (and therefore to avoid making this special case which will result in *less* specific guarantees). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org