[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982251#comment-15982251 ]
Peng Meng commented on SPARK-20446: ----------------------------------- Yes, I compared with ML ALSModel.recommendAll. The data size is 480,000*17,000, comparing with mllib recommendAll, there is about 10% to 20% improvement. Blockify is very important the performance of our solution. Because the compute of CartesianRDD is: for (x <- rdd1.iterator(currSplit.s1, context); y <- rdd2.iterator(currSplit.s2, context)) yield (x, y) If no blockify, rdd2.iterator will be called the numberOfRecords times of rdd1. With blockify, rdd2.iterator will be called the numberOfGroups times of rdd1. The performance difference is very large. SparkSQL approach is much different from my solution. It uses crossJoin to do Cartesian product, the key optimization is in UnsafeCartesianRDD, which uses rowArray to cache the right partition. The key part of our method: 1. use blockify to optimize the Cartesian production computation, not use blockify for BLAS 3. 2. use priorityQueue to save memory (used on each group), not to find TopK product for each user. SqarkSQL approach doesn't work like this. cc [~mengxr] > Optimize the process of MLLIB ALS recommendForAll > ------------------------------------------------- > > Key: SPARK-20446 > URL: https://issues.apache.org/jira/browse/SPARK-20446 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 2.3.0 > Reporter: Peng Meng > > The recommendForAll of MLLIB ALS is very slow. > GC is a key problem of the current method. > The task use the following code to keep temp result: > val output = new Array[(Int, (Int, Double))](m*n) > m = n = 4096 (default value, no method to set) > so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and > cause serious GC problem, and it is frequently OOM. > Actually, we don't need to save all the temp result. Suppose we recommend > topK (topK is about 10, or 20) product for each user, we only need 4k * topK > * (4 + 4 + 8) memory to save the temp result. > I have written a solution for this method with the following test result. > The Test Environment: > 3 workers: each work 10 core, each work 30G memory, each work 1 executor. > The Data: User 480,000, and Item 17,000 > BlockSize: 1024 2048 4096 8192 > Old method: 245s 332s 488s OOM > This solution: 121s 118s 117s 120s > -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org