Yun Ni created SPARK-18334: ------------------------------ Summary: MinHash should use binary hash distance Key: SPARK-18334 URL: https://issues.apache.org/jira/browse/SPARK-18334 Project: Spark Issue Type: Bug Reporter: Yun Ni Priority: Trivial
MinHash currently is using the same `hashDistance` function as RandomProjection. This does not make sense for MinHash because the Jaccard distance of two sets is not relevant to the absolute distance of their hash buckets indices. This bug could affect accuracy of multi probing NN search for MinHash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org