[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241206#comment-14241206 ]
Zhang, Liye commented on SPARK-4740: ------------------------------------ Hi [~adav], [~rxin], I run the test with the latest master branch today, In which rxin's patch is merged. On my 4 nodes 48 cores per node cluster, I set the *spark.local.dir* to one tmpfs (ramdisk) dir, the ramdisk size is 136GB to make sure enough for shuffle (total shuffle write 284GB, total shuffle read 213GB), *spark.executor.memory* is set to 48GB. In this way to eliminate the disk I/O effect. Still with the 400GB data set, the test result shows Netty is better than NIO (reduce time *Netty:24mins* VS *NIO:26mins*). Also, I retested with 8HDDs, remain *spark.executor.memory* with 48GB, set *spark.local.dir* to 8 HDD dirs. The result is about the same as before, that is NIO outperforms Netty (reduce time*Netty:32mins* VS *NIO:25mins*). And in Netty test, unbalance still exists, the best executor finishes 308 tasks, and the worst executor only finished 222 tasks. It seems NIO is not effected with whether it is HDD or ramdisk, while Netty is more sensitive with HDD. Till now, maybe we can limit the problem to the different behavior between Netty and NIO on disk operations. > Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey > ------------------------------------------------------------------------ > > Key: SPARK-4740 > URL: https://issues.apache.org/jira/browse/SPARK-4740 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 1.2.0 > Reporter: Zhang, Liye > Assignee: Reynold Xin > Attachments: (rxin patch better executor)TestRunner sort-by-key - > Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner > sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report > 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner > sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, > TestRunner sort-by-key - Thread dump for executor 1_files (Nio-48 cores per > node).zip, rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z > > > When testing current spark master (1.3.0-snapshot) with spark-perf > (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService > takes much longer time than NIO based shuffle transferService. The network > throughput of Netty is only about half of that of NIO. > We tested with standalone mode, and the data set we used for test is 20 > billion records, and the total size is about 400GB. Spark-perf test is > Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each > executor memory is 64GB. The reduce tasks number is set to 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org