Github user zecevicp commented on the issue:

    https://github.com/apache/spark/pull/21109
  
    I added benchmark code in `JoinBenchmark`. The tests show 8x improvement 
over non-optimized code. Although, it should be noted that the results depend 
on the exact range conditions and the calculations performed on each matched 
row. 
    In our case, we were not able to cross-match two rather large datasets (1.2 
billion rows x 800 million rows) without this optimization. With the 
optimization, the cross-match finishes in less than 2 minutes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to