[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225681#comment-14225681 ]
Aaron Davidson commented on SPARK-2468: --------------------------------------- Hey guys, I finally got a chance to run a more comprehensive set of tests with constrained containers. In doing so, I found a critical issue which caused us to allocate direct byte buffers proportional to the number of executors times the number of cores, rather than just proportional to the number of cores. With patch [#3465|https://github.com/apache/spark/pull/3465], I was able to run a shuffle with [~lianhuiwang]'s configuration of 7GB container with 6GB heap and 2 cores -- prior to the patch, it exceeded the container's limits. If you guys get a chance, please let me know if this is sufficient to fix your issues with your initial overhead configurations. (Note that while the memory usage was greatly decreased, we still allocate a significant amount of off-heap memory, so it's possible you need to shift some of the heap to off-heap if your off-heap was previously very constrained.) > Netty-based block server / client module > ---------------------------------------- > > Key: SPARK-2468 > URL: https://issues.apache.org/jira/browse/SPARK-2468 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Reporter: Reynold Xin > Assignee: Reynold Xin > Priority: Critical > Fix For: 1.2.0 > > > Right now shuffle send goes through the block manager. This is inefficient > because it requires loading a block from disk into a kernel buffer, then into > a user space buffer, and then back to a kernel send buffer before it reaches > the NIC. It does multiple copies of the data and context switching between > kernel/user. It also creates unnecessary buffer in the JVM that increases GC > Instead, we should use FileChannel.transferTo, which handles this in the > kernel space with zero-copy. See > http://www.ibm.com/developerworks/library/j-zerocopy/ > One potential solution is to use Netty. Spark already has a Netty based > network module implemented (org.apache.spark.network.netty). However, it > lacks some functionality and is turned off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org