neilconway opened a new pull request, #22794:
URL: https://github.com/apache/datafusion/pull/22794

   ## Which issue does this PR close?
   
   - Closes #22793
   
   ## Rationale for this change
   
   `get_semi_indices` computes the duplicate-free intersection between a 
`Range` and a `PrimitiveArray` of integer indices (containing probe matches). 
The previous algorithm constructs a bitmap from the `Range` and then probes the 
bitmap for every element in the range. This does work linear in the size of the 
range and also allocates an intermediate data structure.
   
   We can do better by leveraging the fact that the input index array is 
sorted: iterate over the inputs, check membership in the `Range`, and do 
duplicate elimination by just comparing with the previous member of the array. 
This does work linear in the number of matches and avoids the intermediate data 
structure.
   
   We can optimize `get_anti_indices` in a similar manner, except we just need 
to emit the in-range gaps between array elements instead of the array elements 
themselves.
   
   This improves the performance of `RightSemi` and `RightAnti` joins, as well 
as outer joins (since those also call `get_anti_indices`).
   
   ## Benchmarks
   
   Criterion: hash_join_semi_anti
   
   ```
     RightSemi
       right_semi_d100_h100       8.559 ms -> 6.777 ms   20.8% faster
       right_semi_d100_h10        1.726 ms -> 1.108 ms   35.8% faster
       right_semi_d50_h100        8.567 ms -> 6.839 ms   20.2% faster
       right_semi_d50_h10         1.734 ms -> 1.113 ms   35.8% faster
       right_semi_d10_h100       10.860 ms -> 9.109 ms   16.1% faster
       right_semi_d10_h10         8.475 ms -> 7.824 ms    7.7% faster
       right_semi_fanout100_h1    4.834 ms -> 3.214 ms   33.5% faster
   
     RightAnti
       right_anti_d100_h100       3.464 ms -> 2.193 ms   36.7% faster
       right_anti_d100_h10        5.764 ms -> 4.501 ms   21.9% faster
       right_anti_d50_h100        3.475 ms -> 2.214 ms   36.3% faster
       right_anti_d50_h10         5.769 ms -> 4.496 ms   22.1% faster
       right_anti_d10_h100        5.757 ms -> 4.423 ms   23.2% faster
       right_anti_d10_h10        12.481 ms -> 11.251 ms   9.9% faster
   ```
   
   dfbench hj, SF10, partitions=1, batch_size=8192
   
   ```
       Q16 RightSemi              9.486 ms -> 7.283 ms    23.2% faster
       Q17 RightSemi            675.164 ms -> 580.741 ms  14.0% faster
       Q18 RightSemi            687.759 ms -> 630.416 ms   8.3% faster
       Q22 RightSemi fanout     868.412 ms -> 755.968 ms  12.9% faster
   ```
   
   ## What changes are included in this PR?
   
   * Optimize `get_semi_indices` and `get_anti_indices` as described above
   * Unit tests
   * Add benchmark cases to cover high-fanout workloads
   
   ## Are these changes tested?
   
   Yes; new tests added.
   
   ## Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to