AlonSpivack commented on PR #20426:
URL: https://github.com/apache/datafusion/pull/20426#issuecomment-3928006537
Hey @neilconway ,
thanks for moving so quickly on this. Splitting the coercion rules between
comparison_coercion and type_union_coercion is definitely the right
architectural foundation, and it aligns with what @alamb suggested previously
about context-aware coercion.
Regarding the bad query plans you mentioned (like CAST(text_col AS Int64) <
5) and the ClickBench issues:
I think we should tackle this in two separate PRs to respect the project's
architecture:
PR 1 (This PR - Type Level): Your current approach (coercing string→numeric
for comparisons) is correct at the type-abstraction level. Generating a CAST
here is the right logical step to ensure mathematical correctness. As @alamb
noted in the past (e.g., on PR #15482), type coercion rules should be based
strictly on (DataType, DataType) pairs, not on checking for Expr::Literal. So
mixing literal evaluation into this specific phase would be architecturally
incorrect.
PR 2 (Follow-up - Optimizer Level): Once this PR lands, I'd like to submit a
follow-up PR targeting the simplifier (specifically extending
cast_literal_to_type_with_op in unwrap_cast.rs). Since the simplifier is
expected to be expression-aware, we can safely intercept the CAST there:
Early Literal Folding: Fold Literal(Utf8("10")) → Literal(Int64(10)) at
planning time when compared against a numeric column. This gives zero runtime
cost and restores performance/predicate pushdown.
Fail-Fast for Invalid Literals: Throw an instant plan_err! for unparseable
literals like int_col < 'hello' instead of deferring to a runtime CAST failure.
Expanded Coverage: Extend this logic beyond =/!= to handle <, >, <=, >=, and
include Float types.
This two-phase approach gets the critical bug (#15161) fixed quickly, keeps
the type coercion pure, and handles the performance/AST-awareness in the
correct optimizer pass.
Happy to help with the follow-up PR. Thanks again for driving this fix!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]