[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234993#comment-14234993
 ] 

Reynold Xin commented on SPARK-4740:


BTW would be great if you can attach the nio thread dump too. Thanks.


> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf, TestRunner  sort-by-key - 
> Thread dump for executor 1_files (48 Cores per node).zip
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234986#comment-14234986
 ] 

Reynold Xin commented on SPARK-4740:


Thanks - can you try set spark.shuffle.io.lazyFD to false and see how it 
performs?

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf, TestRunner  sort-by-key - 
> Thread dump for executor 1_files (48 Cores per node).zip
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234983#comment-14234983
 ] 

Zhang, Liye commented on SPARK-4740:


[~rxin] I attached the thread dump of one executor (48 cores) in reduce phase, 
please take a look. I'll try 16 cores later on.

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf, TestRunner  sort-by-key - 
> Thread dump for executor 1_files (48 Cores per node).zip
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234963#comment-14234963
 ] 

Reynold Xin commented on SPARK-4740:


Also can you take a few more jstacks and paste those here? Thanks.


> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234960#comment-14234960
 ] 

Reynold Xin commented on SPARK-4740:


Can you limit the number of cores to a lower volume and see what happens? i.e. 
try it with 16 threads and see if the problem still exists. Thanks.


> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234941#comment-14234941
 ] 

Zhang, Liye commented on SPARK-4740:


[~adav], I have tested by setting "spark.shuffle.io.serverThreads" and 
"spark.shuffle.io.clientThreads" to 48, the result does not change, Netty takes 
the same time with 39mins for reduce phase.

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234937#comment-14234937
 ] 

Zhang, Liye commented on SPARK-4740:


We found this issue when we make the performance test for 
[SPARK-2926|https://issues.apache.org/jira/browse/SPARK-2926], since 
[SPARK-2926|https://issues.apache.org/jira/browse/SPARK-2926] takes less time 
in reduce phase, so the difference between Netty and Nio is not too much, about 
20%. So we tested the master branch, and the difference is more significant, 
more than 30%.

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234902#comment-14234902
 ] 

Saisai Shao commented on SPARK-4740:


Besides we also tested with 24 cores WSM cpu, the performance of Netty is still 
slower than NIO.

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234876#comment-14234876
 ] 

Saisai Shao commented on SPARK-4740:


We also tested with small dataset like 40GB, the netty performance is similar 
to NIO, I'm guessing if Netty is not efficient when fetching large number of 
shuffle blocks, in our 400GB case, each reduce task need to fetch about 7000 
shuffle blocks, and each shuffle block is about tens of KB size. 

We will try increase shuffle thread number to test again. Seeing from the call 
stack, all the shuffle client are busy waiting on epoll_wait, I'm not sure is 
this the right thing?

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234758#comment-14234758
 ] 

Aaron Davidson commented on SPARK-4740:
---

Could you try to set "spark.shuffle.io.serverThreads" and 
"spark.shuffle.io.clientThreads" to 48? We have an artificial max default of 8 
to limit off-heap memory usage, but it's possible this is not sufficient to 
saturate 10GB/s.

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234489#comment-14234489
 ] 

Reynold Xin commented on SPARK-4740:


[~adav] Could it be the thread pool size being too small? 

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4740) Netty's network bandwidth is much lower than NIO in spark-perf and Netty takes longer running time

2014-12-04 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234479#comment-14234479
 ] 

Patrick Wendell commented on SPARK-4740:


Thanks for reporting this. We've run a bunch of tests and never found netty to 
be slower than NIO, so this is a helpful piece of feedback. One unique thing 
about your environment is that you have 48 cores per node. Do you observe the 
same effect if you limit the parallelism on each node to fewer cores?

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --
>
> Key: SPARK-4740
> URL: https://issues.apache.org/jira/browse/SPARK-4740
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Zhang, Liye
> Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org