Dandandan opened a new pull request #9213: URL: https://github.com/apache/arrow/pull/9213
This is a WIP PR for implementing a similar approach to hashing as used in the hash join. For the hash-aggregate heavy query TCPH query 1 this speeds it up by ~30%. TODO: - [ ] Implement collision checking - [ ] Add test for collisions - [ ] Move some code to hash utils Benchmark results PR ``` Query 1 iteration 0 took 457.0 ms Query 1 iteration 1 took 459.7 ms Query 1 iteration 2 took 459.3 ms Query 1 iteration 3 took 461.1 ms Query 1 iteration 4 took 456.8 ms Query 1 iteration 5 took 460.6 ms Query 1 iteration 6 took 462.0 ms Query 1 iteration 7 took 462.3 ms Query 1 iteration 8 took 461.0 ms Query 1 iteration 9 took 466.4 ms Query 1 avg time: 460.63 ms ``` Vectorized hashing: ``` Query 1 iteration 0 took 650.0 ms Query 1 iteration 1 took 648.5 ms Query 1 iteration 2 took 646.8 ms Query 1 iteration 3 took 646.2 ms Query 1 iteration 4 took 645.7 ms Query 1 iteration 5 took 643.0 ms Query 1 iteration 6 took 649.5 ms Query 1 iteration 7 took 649.5 ms Query 1 iteration 8 took 643.4 ms Query 1 iteration 9 took 643.6 ms Query 1 avg time: 646.63 ms ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
