[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235986#comment-14235986 ]
Reynold Xin commented on SPARK-4740: ------------------------------------ Looking at the nio stacktrace it does confirm one of our intuitions, i.e. this manifested when the number of nodes is small (4) and the number of disks is large (8). Since in Netty I was initially optimizing for large clusters, the code only establishes a single connection per client/server pair. Due to the way it is structured, this means at any given time there is only 1 disk read active per connection, i.e. max 3 disk reads (4 - 1), whereas in the old NIO package, for some reason it is using 20 threads to do IO. > Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey > ------------------------------------------------------------------------ > > Key: SPARK-4740 > URL: https://issues.apache.org/jira/browse/SPARK-4740 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 1.2.0 > Reporter: Zhang, Liye > Attachments: Spark-perf Test Report 16 Cores per Executor.pdf, > Spark-perf Test Report.pdf, TestRunner sort-by-key - Thread dump for > executor 1_files (Netty-48 Cores per node).zip, TestRunner sort-by-key - > Thread dump for executor 1_files (Nio-48 cores per node).zip > > > When testing current spark master (1.3.0-snapshot) with spark-perf > (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService > takes much longer time than NIO based shuffle transferService. The network > throughput of Netty is only about half of that of NIO. > We tested with standalone mode, and the data set we used for test is 20 > billion records, and the total size is about 400GB. Spark-perf test is > Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each > executor memory is 64GB. The reduce tasks number is set to 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org