Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/22163#discussion_r211954019 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -206,14 +211,21 @@ private void writeSortedFile(boolean isLastFile) { long recordReadPosition = recordOffsetInPage + uaoSize; // skip over record length while (dataRemaining > 0) { final int toTransfer = Math.min(diskWriteBufferSize, dataRemaining); - Platform.copyMemory( - recordPage, recordReadPosition, writeBuffer, Platform.BYTE_ARRAY_OFFSET, toTransfer); - writer.write(writeBuffer, 0, toTransfer); + if (bufferOffset > 0 && bufferOffset + toTransfer > DISK_WRITE_BUFFER_SIZE) { --- End diff -- Not a bad idea, but codes here may not work as you expect. If we got a record with size `X` < `diskWriteBufferSize `(same as `DISK_WRITE_BUFFER_SIZE `), then we will only call `writer.write()` once. And if we got a record with size `Y` >= `diskWriteBufferSize `, then we will call `writer.write()` for (`Y` + `diskWriteBufferSize ` - 1) / `diskWriteBufferSize` times. And this do not change with the new code.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org