neilconway opened a new pull request, #22794:
URL: https://github.com/apache/datafusion/pull/22794
## Which issue does this PR close?
- Closes #22793
## Rationale for this change
`get_semi_indices` computes the duplicate-free intersection between a
`Range` and a `PrimitiveArray` of integer indices (containing probe matches).
The previous algorithm constructs a bitmap from the `Range` and then probes the
bitmap for every element in the range. This does work linear in the size of the
range and also allocates an intermediate data structure.
We can do better by leveraging the fact that the input index array is
sorted: iterate over the inputs, check membership in the `Range`, and do
duplicate elimination by just comparing with the previous member of the array.
This does work linear in the number of matches and avoids the intermediate data
structure.
We can optimize `get_anti_indices` in a similar manner, except we just need
to emit the in-range gaps between array elements instead of the array elements
themselves.
This improves the performance of `RightSemi` and `RightAnti` joins, as well
as outer joins (since those also call `get_anti_indices`).
## Benchmarks
Criterion: hash_join_semi_anti
```
RightSemi
right_semi_d100_h100 8.559 ms -> 6.777 ms 20.8% faster
right_semi_d100_h10 1.726 ms -> 1.108 ms 35.8% faster
right_semi_d50_h100 8.567 ms -> 6.839 ms 20.2% faster
right_semi_d50_h10 1.734 ms -> 1.113 ms 35.8% faster
right_semi_d10_h100 10.860 ms -> 9.109 ms 16.1% faster
right_semi_d10_h10 8.475 ms -> 7.824 ms 7.7% faster
right_semi_fanout100_h1 4.834 ms -> 3.214 ms 33.5% faster
RightAnti
right_anti_d100_h100 3.464 ms -> 2.193 ms 36.7% faster
right_anti_d100_h10 5.764 ms -> 4.501 ms 21.9% faster
right_anti_d50_h100 3.475 ms -> 2.214 ms 36.3% faster
right_anti_d50_h10 5.769 ms -> 4.496 ms 22.1% faster
right_anti_d10_h100 5.757 ms -> 4.423 ms 23.2% faster
right_anti_d10_h10 12.481 ms -> 11.251 ms 9.9% faster
```
dfbench hj, SF10, partitions=1, batch_size=8192
```
Q16 RightSemi 9.486 ms -> 7.283 ms 23.2% faster
Q17 RightSemi 675.164 ms -> 580.741 ms 14.0% faster
Q18 RightSemi 687.759 ms -> 630.416 ms 8.3% faster
Q22 RightSemi fanout 868.412 ms -> 755.968 ms 12.9% faster
```
## What changes are included in this PR?
* Optimize `get_semi_indices` and `get_anti_indices` as described above
* Unit tests
* Add benchmark cases to cover high-fanout workloads
## Are these changes tested?
Yes; new tests added.
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]