[ https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209936#comment-14209936 ]
Debasish Das commented on SPARK-3066: ------------------------------------- On our internal datasets, flatMap is slow...I am changing the code to have 2 methods (assuming users are tall and products are skinny)...if user and product are tall and wide then we need to rethink recommendAllUsers: takeOrdered is called on each userFeature dot productFeatures recommendAllProducts: mapPartitions will emit Iterator(productId, userPriorityQueue) and reduceByKey will generate the topK users for each product.. > Support recommendAll in matrix factorization model > -------------------------------------------------- > > Key: SPARK-3066 > URL: https://issues.apache.org/jira/browse/SPARK-3066 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Xiangrui Meng > > ALS returns a matrix factorization model, which we can use to predict ratings > for individual queries as well as small batches. In practice, users may want > to compute top-k recommendations offline for all users. It is very expensive > but a common problem. We can do some optimization like > 1) collect one side (either user or product) and broadcast it as a matrix > 2) use level-3 BLAS to compute inner products > 3) use Utils.takeOrdered to find top-k -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org