[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201835#comment-14201835 ]
zzc commented on SPARK-2468: ---------------------------- Hi, Aaron Davidson, I can't download logs from server, so I just write them here: there are 3 nodes in cluster, 24 cores / 128G per node, YARN can allocate 20 cores and 80G per node. I start application with command "--driver-memory 10g --num-executors 10 --executor-memory 20g --executor-cores 1 --driver-library-path :/usr/local/hadoop/lib/native/ /opt/wsspark.jar 24G_10_20g_1c 1 100 hdfs://wscluster/zzc_test/in/snappy8/ 100 100 hdfs://wscluster/zzc_test/out/i007" My spark config : spark.default.parallelism 100 spark.shuffle.consolidateFiles false spark.shuffle.spill.compress true spark.shuffle.compress true spark.storage.memoryFraction 0.6 spark.shuffle.memoryFraction 0.5 spark.shuffle.file.buffer.kb 100 spark.reducer.maxMbInFlight 48 spark.shuffle.blockTransferService netty spark.shuffle.io.mode nio spark.shuffle.io.connectionTimeout 120 spark.shuffle.manager SORT spark.shuffle.io.preferDirectBufs false spark.shuffle.io.maxRetries 3 spark.shuffle.io.retryWaitMs 5000 spark.scheduler.mode FIFO spark.akka.frameSize 10 spark.akka.timeout 100 there are about 24G snappy files for input and 14.5G shuffle write data. With above config, from am's log, I find that each container need greater than 13G memory, so OOM occur. If I set "spark.shuffle.blockTransferService=nio", each container need about 12G memory. > Netty-based block server / client module > ---------------------------------------- > > Key: SPARK-2468 > URL: https://issues.apache.org/jira/browse/SPARK-2468 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Reporter: Reynold Xin > Assignee: Reynold Xin > Priority: Critical > Fix For: 1.2.0 > > > Right now shuffle send goes through the block manager. This is inefficient > because it requires loading a block from disk into a kernel buffer, then into > a user space buffer, and then back to a kernel send buffer before it reaches > the NIC. It does multiple copies of the data and context switching between > kernel/user. It also creates unnecessary buffer in the JVM that increases GC > Instead, we should use FileChannel.transferTo, which handles this in the > kernel space with zero-copy. See > http://www.ibm.com/developerworks/library/j-zerocopy/ > One potential solution is to use Netty. Spark already has a Netty based > network module implemented (org.apache.spark.network.netty). However, it > lacks some functionality and is turned off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org