Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1058#discussion_r160841030
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
---
@@ -107,7 +107,7 @@
* nodes provide insufficient local disk space)
*/
- private static final int TRANSFER_SIZE = 32 * 1024;
+ private static final int TRANSFER_SIZE = 1024 * 1024;
--- End diff --
Is a 1MB buffer excessive? The point of a buffer is to ensure we write in
units of a disk block. For the local file system, experience showed no gain
after 32K. In the MapR FS, each write is in units of 1 MB. Does Hadoop have a
preferred size?
Given this variation, if we need large buffers, should we choose a buffer
size based on the underlying file system? For example, is there a preferred
size for S3?
32K didn't seem large enough to worry about, even if we had 1000 fragments
busily spilling. But 1MB? 1000 * 1 MB = 1GB, which starts becoming significant,
especially in light of our efforts to reduce heap usage. Should we worry?
---