Github user zecevicp commented on the issue: https://github.com/apache/spark/pull/21109 I added benchmark code in `JoinBenchmark`. The tests show 8x improvement over non-optimized code. Although, it should be noted that the results depend on the exact range conditions and the calculations performed on each matched row. In our case, we were not able to cross-match two rather large datasets (1.2 billion rows x 800 million rows) without this optimization. With the optimization, the cross-match finishes in less than 2 minutes.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org