[ 
https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875552#comment-15875552
 ] 

Nick Pentreath edited comment on SPARK-18454 at 2/21/17 8:00 AM:
-----------------------------------------------------------------

Can you also comment on 
http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E?
 It would be good to understand why we're seeing poor performance vs an 
alternative impl in Spark packages, and whether we can take some idea from that 
on how to improve performance.

Though it's true it does not support similarity join. Still we should 
investigate.


was (Author: mlnick):
Can you also comment on 
http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E?
 It would be good to understand why we're seeing poor performance vs an 
alternative impl in Spark packages, and whether we can take some idea from that 
on how to improve performance.

> Changes to improve Nearest Neighbor Search for LSH
> --------------------------------------------------
>
>                 Key: SPARK-18454
>                 URL: https://issues.apache.org/jira/browse/SPARK-18454
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yun Ni
>
> We all agree to do the following improvement to Multi-Probe NN Search:
> (1) Use approxQuantile to get the {{hashDistance}} threshold instead of doing 
> full sort on the whole dataset
> Currently we are still discussing the following:
> (1) What {{hashDistance}} (or Probing Sequence) we should use for {{MinHash}}
> (2) What are the issues and how we should change the current Nearest Neighbor 
> implementation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to