Dandandan opened a new pull request #9213:
URL: https://github.com/apache/arrow/pull/9213


   This is a WIP PR for implementing a similar approach to hashing as used in 
the hash join.
   For the hash-aggregate heavy query TCPH query 1 this speeds it up by ~30%.
   
   TODO:
   - [ ] Implement collision checking
   - [ ] Add test for collisions
   - [ ] Move some code to hash utils
   
   Benchmark results
   PR
   ```
   Query 1 iteration 0 took 457.0 ms
   Query 1 iteration 1 took 459.7 ms
   Query 1 iteration 2 took 459.3 ms
   Query 1 iteration 3 took 461.1 ms
   Query 1 iteration 4 took 456.8 ms
   Query 1 iteration 5 took 460.6 ms
   Query 1 iteration 6 took 462.0 ms
   Query 1 iteration 7 took 462.3 ms
   Query 1 iteration 8 took 461.0 ms
   Query 1 iteration 9 took 466.4 ms
   Query 1 avg time: 460.63 ms
   ```
   
   Vectorized hashing:
   ```
   Query 1 iteration 0 took 650.0 ms
   Query 1 iteration 1 took 648.5 ms
   Query 1 iteration 2 took 646.8 ms
   Query 1 iteration 3 took 646.2 ms
   Query 1 iteration 4 took 645.7 ms
   Query 1 iteration 5 took 643.0 ms
   Query 1 iteration 6 took 649.5 ms
   Query 1 iteration 7 took 649.5 ms
   Query 1 iteration 8 took 643.4 ms
   Query 1 iteration 9 took 643.6 ms
   Query 1 avg time: 646.63 ms
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to