[ 
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353129#comment-14353129
 ] 

Sean Owen commented on SPARK-3066:
----------------------------------

My anecdotal experience with it was that getting an order-of-magnitude speedup 
meant losing a small but noticeable amount of quality in the top 
recommendations. That is, you would fail to consider as candidates some of the 
items that were actually top recs. 

The most actionable test / implementation I have to show this for ALS is ... 
https://github.com/cloudera/oryx/blob/master/als-common/src/it/java/com/cloudera/oryx/als/common/candidate/LocationSensitiveHashIT.java
  This could let you run tests for a certain scale, certain degree of hashing, 
etc., if you wanted to.

I've actually tried to avoid needing LSH just for speed in order to avoid this 
tradeoff.

Anyway for papers? I found this pretty complex treatment: 
http://papers.nips.cc/paper/5329-asymmetric-lsh-alsh-for-sublinear-time-maximum-inner-product-search-mips.pdf

This has a little info on the quality of LSH:
https://fruct.org/sites/default/files/files/conference15/Ponomarev_LSH_P2P.pdf

It's one of those things where I'm sure it can be done better than the basic 
ways I know to do it, but haven't yet found a killer paper.


> Support recommendAll in matrix factorization model
> --------------------------------------------------
>
>                 Key: SPARK-3066
>                 URL: https://issues.apache.org/jira/browse/SPARK-3066
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Debasish Das
>
> ALS returns a matrix factorization model, which we can use to predict ratings 
> for individual queries as well as small batches. In practice, users may want 
> to compute top-k recommendations offline for all users. It is very expensive 
> but a common problem. We can do some optimization like
> 1) collect one side (either user or product) and broadcast it as a matrix
> 2) use level-3 BLAS to compute inner products
> 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to