[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992471#comment-15992471
 ] 

Teng Jiang commented on SPARK-20443:
------------------------------------

I did some tests on the blockSize. 
The test environment is:
3 workers: each work 40 core, each worker 180G memory, each worker 1 executor.
The Data: user 3,290,000, and item 208,000
The results are:
blockSize  rank=10       rank = 100
128              67.32min       127.66min 
256              46.68min       87.67min 
512              35.66min       63.46min
1024     28.49min       41.61min
2048     22.83min       34.76min
4096     22.39min       54.43min
8192     23.35min       71.09min

Another dataset with 480,000 users and 17,000 items. The rank was set to 10.
blockSize 128     256     512     1024   2048   4096   8192
time (s)    98.2    70.4    52.7     45.3   45.0    60.5     67.3

For both datasets, with the blockSize grows from 128 to 8192, the recommend 
time first decreases and then increases.
Therefore, for different datasets, the optimal blockSize is different. 


> The blockSize of MLLIB ALS should be setting  by the User
> ---------------------------------------------------------
>
>                 Key: SPARK-20443
>                 URL: https://issues.apache.org/jira/browse/SPARK-20443
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>            Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to