[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-05-02 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992497#comment-15992497
 ] 

Nick Pentreath commented on SPARK-20443:


Interesting - though it appears to me that {{2048}} is the best setting for 
both data sizes. At the least I think we should adjust the default.

> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-05-02 Thread Teng Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992492#comment-15992492
 ] 

Teng Jiang commented on SPARK-20443:


All the tests above were did with SPARK-11968 / [PR #17742 | 
https://github.com/apache/spark/pull/17742]. 
The blockSize still makes sense considering the times of data fetching per 
iteration and the GC time.

> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-05-02 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992475#comment-15992475
 ] 

Nick Pentreath commented on SPARK-20443:


Were these tests against existing master? Because SPARK-11968 / [PR 
#17742|https://github.com/apache/spark/pull/17742] should make block size less 
relevant - we should of course re-test this once that PR is merged in, to see 
if it's worth exposing the parameter.

> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-05-02 Thread Teng Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992471#comment-15992471
 ] 

Teng Jiang commented on SPARK-20443:


I did some tests on the blockSize. 
The test environment is:
3 workers: each work 40 core, each worker 180G memory, each worker 1 executor.
The Data: user 3,290,000, and item 208,000
The results are:
blockSize  rank=10   rank = 100
128  67.32min   127.66min 
256  46.68min   87.67min 
512  35.66min   63.46min
1024 28.49min   41.61min
2048 22.83min   34.76min
4096 22.39min   54.43min
8192 23.35min   71.09min

Another dataset with 480,000 users and 17,000 items. The rank was set to 10.
blockSize 128 256 512 1024   2048   4096   8192
time (s)98.270.452.7 45.3   45.060.5 67.3

For both datasets, with the blockSize grows from 128 to 8192, the recommend 
time first decreases and then increases.
Therefore, for different datasets, the optimal blockSize is different. 


> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-04-25 Thread Peng Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983059#comment-15983059
 ] 

Peng Meng commented on SPARK-20443:
---

Yes, based on my current test, I agree.
But if the data size is large,  maybe there is benefit to adjust block size. 

> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-04-25 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983050#comment-15983050
 ] 

Nick Pentreath commented on SPARK-20443:


Your PR for SPARK-20446 / SPARK11968 should largely remove the need to adjust 
the block size? Do you agree?

> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-04-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980757#comment-15980757
 ] 

Apache Spark commented on SPARK-20443:
--

User 'mpjlu' has created a pull request for this issue:
https://github.com/apache/spark/pull/17739

> The blockSize of MLLIB ALS should be setting  by the User
> -
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0
>Reporter: Peng Meng
>Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 48W, and Item 1.7W



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org