Re: [PR] Optimize merging of partial case expression results [datafusion]

via GitHub Sun, 26 Oct 2025 08:43:05 -0700


pepijnve commented on PR #18152:
URL: https://github.com/apache/datafusion/pull/18152#issuecomment-3448639483


   > Your solution **assume** that case expression evaluation are cheaper than 
copy record batch, right?
   
   I don't understand what you mean. Could you clarify where you see that 
assumption?
   
   The current code on `main` is already copying record batches on every 
`evaluate_selection` call. `evaluate_selection(rb, selection)` is basically 
`scatter(evaluate(filter_record_batch(rb, selection), selection)`.
   
   What I'm trying to do here is actually to reduce the amount of data that's 
processed. The implementation on `main` always starts from the full input 
record batch, while the implementation here reduces the size of the record 
batch as it goes through the case branches. #18275 takes this one step further 
by projecting away (and as consequence not filtering) unused columns.
   
   Additionally on the result processing side, the current implementation zips 
arrays with length `record_batch.num_rows()` for each branch. The merge 
operation tries to reduce that to just a single pass instead that's even 
avoided if possible.
   
   > in case a IS NULL filtered 10% for example, do you evaluate a > 1 for the 
remaining 90% or 100%?
   
   90%. 100% would not work in general. There were already SLTs related to lazy 
evaluation of the 'then' expressions. I've added a couple extra for the 'when' 
expressions/predicates as well.
   
   See the second diagram in 
https://github.com/apache/datafusion/pull/18152#issuecomment-3447841401 for a 
worked example of the exact evaluation strategy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize merging of partial case expression results [datafusion]

Reply via email to