[ https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353129#comment-14353129 ]
Sean Owen commented on SPARK-3066: ---------------------------------- My anecdotal experience with it was that getting an order-of-magnitude speedup meant losing a small but noticeable amount of quality in the top recommendations. That is, you would fail to consider as candidates some of the items that were actually top recs. The most actionable test / implementation I have to show this for ALS is ... https://github.com/cloudera/oryx/blob/master/als-common/src/it/java/com/cloudera/oryx/als/common/candidate/LocationSensitiveHashIT.java This could let you run tests for a certain scale, certain degree of hashing, etc., if you wanted to. I've actually tried to avoid needing LSH just for speed in order to avoid this tradeoff. Anyway for papers? I found this pretty complex treatment: http://papers.nips.cc/paper/5329-asymmetric-lsh-alsh-for-sublinear-time-maximum-inner-product-search-mips.pdf This has a little info on the quality of LSH: https://fruct.org/sites/default/files/files/conference15/Ponomarev_LSH_P2P.pdf It's one of those things where I'm sure it can be done better than the basic ways I know to do it, but haven't yet found a killer paper. > Support recommendAll in matrix factorization model > -------------------------------------------------- > > Key: SPARK-3066 > URL: https://issues.apache.org/jira/browse/SPARK-3066 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Xiangrui Meng > Assignee: Debasish Das > > ALS returns a matrix factorization model, which we can use to predict ratings > for individual queries as well as small batches. In practice, users may want > to compute top-k recommendations offline for all users. It is very expensive > but a common problem. We can do some optimization like > 1) collect one side (either user or product) and broadcast it as a matrix > 2) use level-3 BLAS to compute inner products > 3) use Utils.takeOrdered to find top-k -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org