[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244168#comment-14244168
 ] 

Jie Huang commented on SPARK-4740:
----------------------------------

[~rxin], I wonder if it is possible that Netty approach changes IO access 
pattern vs. that NIO one. If so, it is a trade-off game. 

Obviously, netty can save CPU overhead, which really resolves the problem of 
high CPU utilization in NIO(which is also demonstrated in [~liyezhang556520]'s 
perf report ). But, to those slow disks (especially to serve heavy random IOs), 
buffering large IO may bring better performance. 
1. We found that with powerful CPU(to resolve the CPU overhead in another way), 
we can reach 100MB/s disk bandwidth in NIO (which only reaches 40MB/s in those 
4 AWS instances for both netty and nio).  
2. But for netty, it somehow breaks those IO requests, and leads to poorer IO 
throughput( though it solves the CPU problem).But it changes to an IO problem. 
By using SSDs, the IO bottleneck is not so that obvious like HDDs used in our 
environment.

Finally, the problem will be either IO bound or CPU bound.  If so, as the next 
step, we'd better to consider merge IO(resolving the IO bottleneck) in netty 
also. The possible way to prove that is to compare IO access pattern(by 
collecting iostat data, e.g., request size, queue size, r/w merge number, 
util...). 

> Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4740
>                 URL: https://issues.apache.org/jira/browse/SPARK-4740
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Zhang, Liye
>            Assignee: Reynold Xin
>         Attachments: (rxin patch better executor)TestRunner  sort-by-key - 
> Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner  
> sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 
> 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner  
> sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, 
> TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per 
> node).zip, rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to