Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12083#discussion_r58287737 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -44,6 +45,14 @@ private long totalBytesTransferred; /** + * When the write buffer size is larger than this limit, I/O will be done in chunks of this size. + * The size should not be too large as it will waste underlying memory copy. e.g. If network + * avaliable buffer is smaller than this limit, the data cannot be sent within one single write + * operation while it still will make memory copy with this size. + */ + private static final int NIO_BUFFER_LIMIT = 512 * 1024; --- End diff -- >What if we create DirectByteBuffer here manually for a big buf (big enough so that we can get benefits even if creating a direct buffer is slow) and try to write as many as possible? Then we can avoid the memory copy in IOUtil.write. @zsxwing , Yes, redundant copy can be avoided if we give a directBuffer directly to `WritableByteChannel.write()` because of code in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#50, but I don't know if that's worthwhile. `IOUtil` will maintain a directBuffer pool to avoid frequently allocate the directBuffers. I think that's why when I made the test, the first time I run code `sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Long](1024 * 1024 * 200)).iterator).reduce((a,b)=> a).length`, the network throughput is extremely low on executor side, and if I ran this code after I ran the code `sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`, the network throughput will be much higher. So, If we want create direct Buffer manually in Spark, It's better also maintain a buffer pool, but that will introduce much more complexity and have the risk of memory leak.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org