GitHub user vitamon opened a pull request: https://github.com/apache/spark/pull/19409
fix openHashSet to actually use quadratic probing instead of linear The comments in the code state that OpehHashSet uses quadratic probing, but in fact it uses linear probing, which "results in primary clustering, and as the cluster grows larger, the search for those items hashing within the cluster becomes less efficient." see https://en.wikipedia.org/wiki/Quadratic_probing OpenHashSetSuite pass with both probing methods. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vitamon/spark openhashset Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19409.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19409 ---- commit 6fb0a407edd3ae8c8d4b9154076768ed03028a09 Author: Vitalii Tamazian <vtamaz...@google.com> Date: 2017-10-02T14:15:26Z fix openHashSet to actually use quadratic probing instead of linear ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org