AlonSpivack commented on PR #20426:
URL: https://github.com/apache/datafusion/pull/20426#issuecomment-3928006537

   Hey @neilconway ,
   thanks for moving so quickly on this. Splitting the coercion rules between 
comparison_coercion and type_union_coercion is definitely the right 
architectural foundation, and it aligns with what @alamb suggested previously 
about context-aware coercion.
   
   Regarding the bad query plans you mentioned (like CAST(text_col AS Int64) < 
5) and the ClickBench issues:
   
   I think we should tackle this in two separate PRs to respect the project's 
architecture:
   
   PR 1 (This PR - Type Level): Your current approach (coercing string→numeric 
for comparisons) is correct at the type-abstraction level. Generating a CAST 
here is the right logical step to ensure mathematical correctness. As @alamb 
noted in the past (e.g., on PR #15482), type coercion rules should be based 
strictly on (DataType, DataType) pairs, not on checking for Expr::Literal. So 
mixing literal evaluation into this specific phase would be architecturally 
incorrect.
   
   PR 2 (Follow-up - Optimizer Level): Once this PR lands, I'd like to submit a 
follow-up PR targeting the simplifier (specifically extending 
cast_literal_to_type_with_op in unwrap_cast.rs). Since the simplifier is 
expected to be expression-aware, we can safely intercept the CAST there:
   
   Early Literal Folding: Fold Literal(Utf8("10")) → Literal(Int64(10)) at 
planning time when compared against a numeric column. This gives zero runtime 
cost and restores performance/predicate pushdown.
   
   Fail-Fast for Invalid Literals: Throw an instant plan_err! for unparseable 
literals like int_col < 'hello' instead of deferring to a runtime CAST failure.
   
   Expanded Coverage: Extend this logic beyond =/!= to handle <, >, <=, >=, and 
include Float types.
   
   This two-phase approach gets the critical bug (#15161) fixed quickly, keeps 
the type coercion pure, and handles the performance/AST-awareness in the 
correct optimizer pass.
   
   Happy to help with the follow-up PR. Thanks again for driving this fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to