Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19860 @kiszk @viirya I made the following performance test: ``` val a = (1 to 100000).map(x => 1).toDS val filtered = a.where($"value".isin((1 to 100000): _*)) (1 to 20).map(x=>time(filtered.count)).sum / 20 // where time is an easy function which measures the function time ``` before the PR the average execution time over the 20 trials is 3,428 s, while after the PR it is 3,121 s (on OSX 2,8 GHz Intel Core i7). This means about 10% improvement of the overall performance in this case.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org