[ 
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010635#comment-15010635
 ] 

Jonas Seiler commented on SPARK-5992:
-------------------------------------

Very nice!

Hope there will be soon a common interface and we can use all this methods.

When deciding up on methods, please take the executor memory consumption into 
account. If I have to store a #num of org. dim x #of reduced dim Matrix on each 
executor. This might be too much already (at least in our case).
If you use sparse random projections (e.g.)
Ping Li, T. Hastie and K. W. Church, 2006, “Very Sparse Random Projections”.
https://web.stanford.edu/~hastie/Papers/Ping/KDD06_rp.pdf 
You can store a sparse matrix and have a parameter to tune the memory 
consumption.

> Locality Sensitive Hashing (LSH) for MLlib
> ------------------------------------------
>
>                 Key: SPARK-5992
>                 URL: https://issues.apache.org/jira/browse/SPARK-5992
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be 
> great to discuss some possible algorithms here, choose an API, and make a PR 
> for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to