cloud-fan opened a new pull request, #55506: URL: https://github.com/apache/spark/pull/55506
### What changes were proposed in this pull request? In `FilterExec.doConsume`'s CSE branch, emit each `IsNotNull(ref)` from `notNullPreds` just before the first `otherPred` that references `ref`, and defer any remaining `notNullPreds` to after all `otherPreds`. This mirrors the non-CSE path's ordering. Previously the CSE branch emitted the full block of `notNullPreds` up front, before any `otherPred`. ### Why are the changes needed? `InferFiltersFromConstraints` can add an `IsNotNull(expr)` whose inner expression can throw on valid non-null inputs -- e.g. `IsNotNull(cast(s as int))` derived from a downstream join that expects the cast to be non-null. That `IsNotNull` lands in `notNullPreds`. The CSE branch was emitting it up front, so on any row where an earlier-ordered guard `otherPred` (e.g. `kind = 'numeric'`) would have short-circuited, the upfront `IsNotNull` still evaluated `cast(s as int)` and threw `CAST_INVALID_INPUT`. The non-CSE branch interleaves by-reference and defers leftovers, so the same query succeeds there. Repro (added as a new test): a CTE chain where an inner guard filter bounds the input to a type that the outer projection then casts; `CombineFilters` + `PushPredicateThroughProject` + `InferFiltersFromConstraints` fold everything into a single ``` Filter(isnotnull(kind) AND kind = 'numeric' AND isnotnull(cast(s as int))) ``` With CSE enabled the cast throws on non-numeric rows; with CSE disabled it succeeds on the same physical plan. Same class of bug as (a) -- short-circuit preservation -- but across the `notNullPreds` / `otherPreds` boundary rather than between adjacent `otherPreds`. ### Does this PR introduce _any_ user-facing change? No user-facing API change. Fixes a regression where `FilterExec` CSE codegen could throw on rows an earlier-ordered guard predicate would have rejected. ### How was this patch tested? Added \`SPARK-56032: FilterExec CSE emits all notNullPreds before guard otherPred\` to \`WholeStageCodegenSuite\`. Without the fix it throws \`SparkNumberFormatException\`; with the fix it matches the non-CSE result. All existing \`WholeStageCodegenSuite\` and \`DataFrameSuite\` tests continue to pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
