kosiew opened a new pull request, #20961:
URL: https://github.com/apache/datafusion/pull/20961

   
   ## Which issue does this PR close?
   
   * Part of #20002
   
   ## Rationale for this change
   
   `PushDownFilter` can spend a disproportionate amount of planning time 
inferring predicates across joins. One expensive path is 
`is_restrict_null_predicate`, which falls back to compiling and evaluating the 
predicate against a null-filled schema to decide whether a predicate is 
null-rejecting.
   
   For predicates that reference columns outside the join-key set, that 
evaluation cannot succeed with the synthetic null schema built for join columns 
only. In practice, callers already treat evaluation failures as 
non-restricting, but we still pay the full cost of the physical-expression 
compilation and evaluation path first.
   
   This change adds a cheap guard to detect predicates that reference columns 
outside the allowed join columns and returns `false` early. That preserves the 
existing behavior while avoiding unnecessary work in a hot optimizer path.
   
   ## What changes are included in this PR?
   
   This PR makes two focused changes:
   
   1. In `is_restrict_null_predicate`, collect the join columns into a 
`HashSet` and add a fast-path check that verifies whether the predicate only 
references those columns.
   2. If the predicate references any non-join column, return `Ok(false)` 
immediately instead of attempting null-evaluation.
   
   Additionally:
   
   * The evaluated join-column set is reused for the fallback 
`evaluate_expr_with_null_column` path.
   * `InferredPredicates::insert_inferred_predicate` is simplified to use 
`.unwrap_or(false)` when consuming `is_restrict_null_predicate`, which matches 
the prior effective behavior of treating errors as non-restricting.
   * A regression test is added for a predicate like `a > b`, where `b` is 
outside the join-key set, to verify the fast path returns `false`.
   
   ## Are these changes tested?
   
   Yes.
   
   A test case was added to cover the scenario where a predicate references a 
column outside the join key set:
   
   * `a > b` now explicitly verifies that `is_restrict_null_predicate` returns 
`false`.
   
   This exercises the new early-return path and protects against regressions in 
predicate analysis behavior.
   
   ## Are there any user-facing changes?
   
   No.
   
   This change is an internal optimizer performance improvement and does not 
change public APIs or intended query results.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to