[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244710#comment-14244710 ]
Aaron Davidson commented on SPARK-4740: --------------------------------------- The thing is, the decision about which IO requests to make is done at a higher level -- individual blocks are requested by the ShuffleBlockFetcherIterator. So it is not changed based on Netty vs NIO. The only difference that i can think of is the concurrency of the requests -- in Netty we only make 1 concurrent request per machine in the cluster (so in the 4-node case, 3 concurrent requests to disk), while in NIO we saw 20 concurrent threads reading from disk in the same environment. It's possible that the disk controllers handle the increased request parallelism by correctly merging IO requests at the hardware level. To this end, we added the numConnectionsPerPeer option in order to allow users to make multiple concurrent requests per machine. However, testing on the HDD cluster did not seem to reveal that the problem was resolved. I would be curious what the results would be for the powerful CPU test with numConnectionsPerPeer set to around 6 (which should enable up to 18 concurrent requests from disk, similar to the observed NIO 20). Another possibility is that while the requests in both the Netty and NIO cases are the same, we receive all the IO requests from a single reduce task before moving onto the next. Two reduce tasks are going to read through all the local files, so if we asked the disk to read, say, 10 KB from 200 files, and subsequently did another sweep of 10 KB from the same 200 files, this could be less efficient than if we interleaved the requests such that we ask for 10 KB from file0 for reduce0 then 10 KB from file0 for reduce1, then moved on to file1. It is possible that somehow NIO is interleaving these requests more naturally than Netty, though this problem is not fundamental to either system and is probably a happy coincidence if it is the case. We could try, for instance, handling the response to block requests on a different event loop than receiving the requests, and randomly choose from any buffered requests to serve IO. > Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey > ------------------------------------------------------------------------ > > Key: SPARK-4740 > URL: https://issues.apache.org/jira/browse/SPARK-4740 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 1.2.0 > Reporter: Zhang, Liye > Assignee: Reynold Xin > Attachments: (rxin patch better executor)TestRunner sort-by-key - > Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner > sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report > 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner > sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, > TestRunner sort-by-key - Thread dump for executor 1_files (Nio-48 cores per > node).zip, rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z > > > When testing current spark master (1.3.0-snapshot) with spark-perf > (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService > takes much longer time than NIO based shuffle transferService. The network > throughput of Netty is only about half of that of NIO. > We tested with standalone mode, and the data set we used for test is 20 > billion records, and the total size is about 400GB. Spark-perf test is > Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each > executor memory is 64GB. The reduce tasks number is set to 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org