Yun Ni created SPARK-18334:
------------------------------

             Summary: MinHash should use binary hash distance
                 Key: SPARK-18334
                 URL: https://issues.apache.org/jira/browse/SPARK-18334
             Project: Spark
          Issue Type: Bug
            Reporter: Yun Ni
            Priority: Trivial


MinHash currently is using the same `hashDistance` function as 
RandomProjection. This does not make sense for MinHash because the Jaccard 
distance of two sets is not relevant to the absolute distance of their hash 
buckets indices.

This bug could affect accuracy of multi probing NN search for MinHash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to