[ https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992471#comment-15992471 ]
Teng Jiang commented on SPARK-20443: ------------------------------------ I did some tests on the blockSize. The test environment is: 3 workers: each work 40 core, each worker 180G memory, each worker 1 executor. The Data: user 3,290,000, and item 208,000 The results are: blockSize rank=10 rank = 100 128 67.32min 127.66min 256 46.68min 87.67min 512 35.66min 63.46min 1024 28.49min 41.61min 2048 22.83min 34.76min 4096 22.39min 54.43min 8192 23.35min 71.09min Another dataset with 480,000 users and 17,000 items. The rank was set to 10. blockSize 128 256 512 1024 2048 4096 8192 time (s) 98.2 70.4 52.7 45.3 45.0 60.5 67.3 For both datasets, with the blockSize grows from 128 to 8192, the recommend time first decreases and then increases. Therefore, for different datasets, the optimal blockSize is different. > The blockSize of MLLIB ALS should be setting by the User > --------------------------------------------------------- > > Key: SPARK-20443 > URL: https://issues.apache.org/jira/browse/SPARK-20443 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 2.3.0 > Reporter: Peng Meng > Priority: Minor > > The blockSize of MLLIB ALS is very important for ALS performance. > In our test, when the blockSize is 128, the performance is about 4X comparing > with the blockSize is 4096 (default value). > The following are our test results: > BlockSize(recommendationForAll time) > 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM) > The Test Environment: > 3 workers: each work 10 core, each work 30G memory, each work 1 executor. > The Data: User 480,000, and Item 17,000 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org