[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236895#comment-14236895
 ] 

Zhang, Liye commented on SPARK-4740:
------------------------------------

Hi [~rxin], on my 4 node cluster, I just tested default numConnectionsPerPeer, 
which is 2. After applying the patch, the reduce time reduced from 40 mins to 
35 mins. But still longer than Nio. One interesting thing is that one of the 
four node performs much better than the other nodes. The CPU usage is high, and 
network throughput is also better than the other 3 nodes. This situation not 
gonna happen with the current master branch.
I'll test with other number of numConnectionsPerPeer later on.

> Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4740
>                 URL: https://issues.apache.org/jira/browse/SPARK-4740
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Zhang, Liye
>            Assignee: Reynold Xin
>            Priority: Blocker
>         Attachments: Spark-perf Test Report 16 Cores per Executor.pdf, 
> Spark-perf Test Report.pdf, TestRunner  sort-by-key - Thread dump for 
> executor 1_files (Netty-48 Cores per node).zip, TestRunner  sort-by-key - 
> Thread dump for executor 1_files (Nio-48 cores per node).zip
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to