Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1058#discussion_r160841030
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
    @@ -107,7 +107,7 @@
          * nodes provide insufficient local disk space)
          */
     
    -    private static final int TRANSFER_SIZE = 32 * 1024;
    +    private static final int TRANSFER_SIZE = 1024 * 1024;
    --- End diff --
    
    Is a 1MB buffer excessive? The point of a buffer is to ensure we write in 
units of a disk block. For the local file system, experience showed no gain 
after 32K. In the MapR FS, each write is in units of 1 MB. Does Hadoop have a 
preferred size?
    
    Given this variation, if we need large buffers, should we choose a buffer 
size based on the underlying file system? For example, is there a preferred 
size for S3?
    
    32K didn't seem large enough to worry about, even if we had 1000 fragments 
busily spilling. But 1MB? 1000 * 1 MB = 1GB, which starts becoming significant, 
especially in light of our efforts to reduce heap usage. Should we worry?


---

Reply via email to