pepijnve commented on PR #18152: URL: https://github.com/apache/datafusion/pull/18152#issuecomment-3449904679
> Your solution **assume** that case expression evaluation are cheaper than copy record batch, right?. > ... do you **evaluate** a > 1 for the remaining 90% or 100%? @rluvaton I was a bit stumped by this feedback at first. Rereading this morning and taking your emphasis into account, I was wondering if the use of `evaluate` rather than `evaluate_selection` is causing incorrect conclusions. It's correct that in this PR I chose to use `evaluate`, but that doesn't mean the expressions are not evaluated selectively. Instead the filtering that's otherwise done by `evaluate_selection` is pulled in to the case evaluation loop. The 'remaining' record batch that's passed to `evaluate` shrinks as we go through the loop. In your example, if we start with 100 rows input, and 10% match the first when predicate, then remaining will be the other 90 rows in the next loop iteration. Then 'then' expression is only evaluated for 10 rows. The filtering is pulled into the loop because I want to reuse the computed and optimised `FilterPredicate` to also filter the row number array. This is required in order to be able to map the partial/selective results back to their original rows. The code in `main` achieves this correlation using a scatter operation based on the original selection vector that maps the partial result back to an array with the same length as the original input. In the same example, 100 rows get filtered down to 10 rows, those 10 rows are evaluated to an array of 10 values, and that array is scattered back to an array of 100 values with nulls inserted where necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
