[ https://issues.apache.org/jira/browse/SPARK-19771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893097#comment-15893097 ]
Yun Ni edited comment on SPARK-19771 at 3/2/17 9:55 PM: -------------------------------------------------------- [~merlin] (1) The computation cost is NumHashFunctions because we go through each index only once. I don't know what's N in the memory overhead? (2) The hash values are not necessarily 0, 1, -1. (3) If we really want a hash function of Vector, why not use Vector.hashCode? was (Author: yunn): [~merlin] (1) The computation cost is NumHashFunctions because we go through each index only once. I don't know what's N in the memory overhead? (2) The hash values are not necessarily {0, 1, -1}. (3) If we really want a hash function of Vector, why not use Vector.hashCode? > Support OR-AND amplification in Locality Sensitive Hashing (LSH) > ---------------------------------------------------------------- > > Key: SPARK-19771 > URL: https://issues.apache.org/jira/browse/SPARK-19771 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.1.0 > Reporter: Yun Ni > > The current LSH implementation only supports AND-OR amplification. We need to > discuss the following questions before we goes to implementations: > (1) Whether we should support OR-AND amplification > (2) What API changes we need for OR-AND amplification > (3) How we fix the approxNearestNeighbor and approxSimilarityJoin internally. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org