Light-City commented on issue #37765:
URL: https://github.com/apache/arrow/issues/37765#issuecomment-1734758320
> This is fundamentally caused by acero strictly evaluating argument
expressions before calling a function on those arguments. Refactoring to
support more intrusive/lazy evaluation semantics would be a significant change;
certainly not one which should be handled as a special case in
`ExecuteScalarExpression`.
>
> I'd recommend looking at the expression simplification passes
(SimplifyWithGuarantee). There's machinery there to pattern match and modify
expressions. Currently it is only used to produce more efficient expressions
using partition information and other guarantees, but it could also be used to
rewrite expressions for safe evaluation:
>
> ```
> Expression unsafe = case_when({greater_than(field_ref("j"), literal(0))}, {
> call("divide", {field_ref("i"), field_ref("j")}),
> field_ref("i"),
> });
> //...
> ARROW_ASSIGN_OR_RAISE(Expression safe, MakeSafe(unsafe));
> assert(safe == call("divide", {
> field_ref("i"),
> call("max", {field_ref("j"), literal(1)}),
> }))
> ```
>
> Another way (much less intensive) to approach this problem would be
writing a new option for the divide compute functions which produces null or
zero when dividing by zero instead of raising an error. This could then be used
explicitly in situations where division by zero is otherwise inevitable.
Yes, the underlying operation needs to be changed. Adding options is a
relatively big change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]