[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...

zecevicp Thu, 10 May 2018 13:09:43 -0700

Github user zecevicp commented on the issue:

    https://github.com/apache/spark/pull/21109
  
    I managed to fix the code path that is executed when the wholestage codegen 
is turned off. Now both code paths give the same results and have the 
optimization implemented. I also changed the tests in the `InnerJoinSuite` 
class so that they are run with both wholestage turned on off and on (which 
wasn't the case so far).
    I updated the benchmark results in `JoinBenchmark`. The results are now the 
following.
    Without inner range optimization:
    ```
    sort merge join:                      Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
    
---------------------------------------------------------------------------------------------
    sort merge join wholestage off            25226 / 25244          0.0       
61585.9       1.0X
    sort merge join wholestage on              8581 / 8983          0.0       
20948.6       2.9X
    ```
    With inner range optimization:
    ```
    sort merge join:                      Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
    
---------------------------------------------------------------------------------------------
    sort merge join wholestage off              1194 / 1212          0.3        
2915.2       1.0X
    sort merge join wholestage on                814 /  867          0.5        
1988.4       1.5X
    ```
    So, there is 10x improvement for wholestage ON case and 21x improvement for 
wholestage OFF case.
    
    I believe this is now ready to be merged, which would greatly help us in 
our projects.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...

Reply via email to