Github user liyezhang556520 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12083#discussion_r58287737
  
    --- Diff: 
common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java
 ---
    @@ -44,6 +45,14 @@
       private long totalBytesTransferred;
     
       /**
    +   * When the write buffer size is larger than this limit, I/O will be 
done in chunks of this size.
    +   * The size should not be too large as it will waste underlying memory 
copy. e.g. If network
    +   * avaliable buffer is smaller than this limit, the data cannot be sent 
within one single write
    +   * operation while it still will make memory copy with this size.
    +   */
    +  private static final int NIO_BUFFER_LIMIT = 512 * 1024;
    --- End diff --
    
    >What if we create DirectByteBuffer here manually for a big buf (big enough 
so that we can get benefits even if creating a direct buffer is slow) and try 
to write as many as possible? Then we can avoid the memory copy in IOUtil.write.
    
    @zsxwing , Yes, redundant copy can be avoided if we give a directBuffer 
directly to `WritableByteChannel.write()` because of code in line 
http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#50,
 but I don't know if that's worthwhile. `IOUtil` will maintain a directBuffer 
pool to avoid frequently allocate the directBuffers. I think that's why when I 
made the test, the first time I run code 
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Long](1024 * 
1024 * 200)).iterator).reduce((a,b)=> a).length`, the network throughput is 
extremely low on executor side, and if I ran this code after I ran the code 
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 
1024 * 50)).iterator).reduce((a,b)=> a).length`, the network throughput will be 
much higher. 
    
    So, If we want create direct Buffer manually in Spark, It's better also 
maintain a buffer pool, but that will introduce much more complexity and have 
the risk of memory leak.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to