[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244710#comment-14244710
 ] 

Aaron Davidson commented on SPARK-4740:
---------------------------------------

The thing is, the decision about which IO requests to make is done at a higher 
level -- individual blocks are requested by the ShuffleBlockFetcherIterator. So 
it is not changed based on Netty vs NIO. The only difference that i can think 
of is the concurrency of the requests -- in Netty we only make 1 concurrent 
request per machine in the cluster (so in the 4-node case, 3 concurrent 
requests to disk), while in NIO we saw 20 concurrent threads reading from disk 
in the same environment. It's possible that the disk controllers handle the 
increased request parallelism by correctly merging IO requests at the hardware 
level.

To this end, we added the numConnectionsPerPeer option in order to allow users 
to make multiple concurrent requests per machine. However, testing on the HDD 
cluster did not seem to reveal that the problem was resolved. I would be 
curious what the results would be for the powerful CPU test with 
numConnectionsPerPeer set to around 6 (which should enable up to 18 concurrent 
requests from disk, similar to the observed NIO 20).

Another possibility is that while the requests in both the Netty and NIO cases 
are the same, we receive all the IO requests from a single reduce task before 
moving onto the next. Two reduce tasks are going to read through all the local 
files, so if we asked the disk to read, say, 10 KB from 200 files, and 
subsequently did another sweep of 10 KB from the same 200 files, this could be 
less efficient than if we interleaved the requests such that we ask for 10 KB 
from file0 for reduce0 then 10 KB from file0 for reduce1, then moved on to 
file1. It is possible that somehow NIO is interleaving these requests more 
naturally than Netty, though this problem is not fundamental to either system 
and is probably a happy coincidence if it is the case. We could try, for 
instance, handling the response to block requests on a different event loop 
than receiving the requests, and randomly choose from any buffered requests to 
serve IO.

> Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4740
>                 URL: https://issues.apache.org/jira/browse/SPARK-4740
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Zhang, Liye
>            Assignee: Reynold Xin
>         Attachments: (rxin patch better executor)TestRunner  sort-by-key - 
> Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner  
> sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 
> 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner  
> sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, 
> TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per 
> node).zip, rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to