Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 It is better adding more perf test for `OpenHashSet` replacement to avoid perf regression. And I found `reduceByKeyLocally` also use `JHashSet`, I am not sure whether there is some special reason. ping @cloud-fan Can you help confirm this ? I cannot find the original author for that.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org