[jira] [Commented] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080657#comment-14080657 ] Raymond Liu commented on SPARK-2468: so, is there anyone working on this? zero-copy shuffle network communication --- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081357#comment-14081357 ] Reynold Xin commented on SPARK-2468: It's something I'd like to prototype for 1.2. Do you have any thoughts on this? zero-copy shuffle network communication --- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060543#comment-14060543 ] Mridul Muralidharan commented on SPARK-2468: We map the file content and directly write that to the socket (except when the size is below 8k or so iirc) - are you sure we are copying to user space and back ? zero-copy shuffle network communication --- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060545#comment-14060545 ] Mridul Muralidharan commented on SPARK-2468: Writing mmap'ed buffers are pretty efficient btw - the second fallback in transferTo implementation iirc. zero-copy shuffle network communication --- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060937#comment-14060937 ] Reynold Xin commented on SPARK-2468: We do use mmap for large blocks. However, most of the shuffle blocks are small so a lot of blocks are not mapped. In addition, there are multiple problems with memory mapped files: 1. Memory mapped blocks are off-heap and are not managed by the JVM, which creates another memory space to tune/mange 2. Memory mapped blocks cannot be reused and are only released at GC 3. On Linux machines with Huge Pages configured (which is increasingly more common with large memory), the default behavior is each file will consume 2MB, leading to OOM very soon. 4. For large blocks that span multiple pages, it creates page faults which leads to unnecessary context switches zero-copy shuffle network communication --- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2468) zero-copy shuffle network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061094#comment-14061094 ] Mridul Muralidharan commented on SPARK-2468: Ah, small files - those are indeed a problem. Btw, we do dispose off map'ed blocks as soon as it is done; so we dont need to wait for gc to free them. Also note that the files are closed as soon as opened and mmap'ed - so they do not count towards open file count/ulimit. Agree on 1, 3 and 4 - some of these apply to sendfile too btw : so not avoidable; but it is the best we have right now. Since we use mmap'ed buffers and rarely transfer the same file again, the performance jump might not be the order(s) of magnitude other projects claim - but then even 10% (or whatever) improvement in our case would be substantial ! zero-copy shuffle network communication --- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty NIO. -- This message was sent by Atlassian JIRA (v6.2#6252)