telleroutlook opened a new pull request, #22580: URL: https://github.com/apache/datafusion/pull/22580
Closes #15631 ## What changes Added `any_bit_set()` and `any_bit_unset()` helpers in `check_short_circuit()` that scan the BooleanBuffer word-by-word and exit on the first word that determines the answer, instead of always doing a full `count_set_bits()` popcount scan. For AND: check `any_bit_set` first (no true values → ReturnLeft), then `any_bit_unset` (all true → ReturnRight). Only fall through to `count_set_bits()` when the exact count is needed for the pre-selection threshold. For OR: check `any_bit_unset` first (no false values → ReturnLeft), then `any_bit_set` (all false → ReturnRight). ## Why `count_set_bits()` scans the entire buffer regardless of the data. For cases where the first few words already tell us the answer (all zeros, all ones, or even just "has at least one set bit"), we can skip the full scan. ## Benchmark results 8192 rows, existing `binary_op` bench: | Scenario | Baseline | Optimized | Change | |---|---|---|---| | and/one_true_first | 544 µs | 450 µs | -17% | | and/one_true_last | 469 µs | 401 µs | -15% | | and/one_true_middle | 428 µs | 389 µs | -9% | | and/one_true_middle_left | 413 µs | 371 µs | -10% | | and/one_true_middle_right | 408 µs | 391 µs | -4% | | and/all_false | 161 ns | 163 ns | ~same | | and/all_true_in_and | 1.25 ms | 1.25 ms | ~same | | or/all_true | 139 ns | 171 ns | ~same | The biggest wins are in the mixed cases where early exit avoids a full buffer scan before falling through to the pre-selection path. The "all same" cases were already fast and see no significant change. ## Test plan - [x] All 83 existing binary expression tests pass - [x] Ran `cargo bench --bench binary_op -- short_circuit` - [ ] CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
