Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1058#discussion_r160841030 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java --- @@ -107,7 +107,7 @@ * nodes provide insufficient local disk space) */ - private static final int TRANSFER_SIZE = 32 * 1024; + private static final int TRANSFER_SIZE = 1024 * 1024; --- End diff -- Is a 1MB buffer excessive? The point of a buffer is to ensure we write in units of a disk block. For the local file system, experience showed no gain after 32K. In the MapR FS, each write is in units of 1 MB. Does Hadoop have a preferred size? Given this variation, if we need large buffers, should we choose a buffer size based on the underlying file system? For example, is there a preferred size for S3? 32K didn't seem large enough to worry about, even if we had 1000 fragments busily spilling. But 1MB? 1000 * 1 MB = 1GB, which starts becoming significant, especially in light of our efforts to reduce heap usage. Should we worry?
---