Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19860
  
    @kiszk @viirya I made the following performance test:
    
    ```
    val a = (1 to 100000).map(x => 1).toDS
    val filtered = a.where($"value".isin((1 to 100000): _*))
    (1 to 20).map(x=>time(filtered.count)).sum / 20 // where time is an easy 
function which measures the function time
    ```
    
    before the PR the average execution time over the 20 trials is 3,428 s, 
while after the PR it is 3,121 s (on OSX 2,8 GHz Intel Core i7). This means 
about 10% improvement of the overall performance in this case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to