[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201844#comment-14201844 ]
zzc commented on SPARK-2468: ---------------------------- By the way, My test code: val mapR = textFile.map(line => { ...... ((value(1) + "_" + date.toString(), url), (flow, 1)) }).reduceByKey((pair1, pair2) => { (pair1._1 + pair2._1, pair1._2 + pair2._2) }, 100) mapR.persist(StorageLevel.MEMORY_AND_DISK_SER) val mapR1 = mapR.groupBy(_._1._1) .mapValues(pairs => { pairs.toList.sortBy(_._2._1).reverse }) .flatMap(values => { values._2 }) .map(values => { values._1._1 + "\t" + values._1._2 + "\t" + values._2._1.toString() + "\t" + values._2._2.toString() }) .saveAsTextFile(outputPath + "_1/") val mapR2 = mapR.groupBy(_._1._1) .mapValues(pairs => { pairs.toList.sortBy(_._2._2).reverse }) .flatMap(values => { values._2 }) .map(values => { values._1._1 + "\t" + values._1._2 + "\t" + values._2._1.toString() + "\t" + values._2._2.toString() }) .saveAsTextFile(outputPath + "_2/") > Netty-based block server / client module > ---------------------------------------- > > Key: SPARK-2468 > URL: https://issues.apache.org/jira/browse/SPARK-2468 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Reporter: Reynold Xin > Assignee: Reynold Xin > Priority: Critical > Fix For: 1.2.0 > > > Right now shuffle send goes through the block manager. This is inefficient > because it requires loading a block from disk into a kernel buffer, then into > a user space buffer, and then back to a kernel send buffer before it reaches > the NIC. It does multiple copies of the data and context switching between > kernel/user. It also creates unnecessary buffer in the JVM that increases GC > Instead, we should use FileChannel.transferTo, which handles this in the > kernel space with zero-copy. See > http://www.ibm.com/developerworks/library/j-zerocopy/ > One potential solution is to use Netty. Spark already has a Netty based > network module implemented (org.apache.spark.network.netty). However, it > lacks some functionality and is turned off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org