Github user zecevicp commented on the issue:
https://github.com/apache/spark/pull/21109
I managed to fix the code path that is executed when the wholestage codegen
is turned off. Now both code paths give the same results and have the
optimization implemented. I also changed the tests in the `InnerJoinSuite`
class so that they are run with both wholestage turned on off and on (which
wasn't the case so far).
I updated the benchmark results in `JoinBenchmark`. The results are now the
following.
Without inner range optimization:
```
sort merge join: Best/Avg Time(ms) Rate(M/s) Per
Row(ns) Relative
---------------------------------------------------------------------------------------------
sort merge join wholestage off 25226 / 25244 0.0
61585.9 1.0X
sort merge join wholestage on 8581 / 8983 0.0
20948.6 2.9X
```
With inner range optimization:
```
sort merge join: Best/Avg Time(ms) Rate(M/s) Per
Row(ns) Relative
---------------------------------------------------------------------------------------------
sort merge join wholestage off 1194 / 1212 0.3
2915.2 1.0X
sort merge join wholestage on 814 / 867 0.5
1988.4 1.5X
```
So, there is 10x improvement for wholestage ON case and 21x improvement for
wholestage OFF case.
I believe this is now ready to be merged, which would greatly help us in
our projects.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]