Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14726#discussion_r75573049
  
    --- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
    @@ -22,15 +22,21 @@
     import com.google.common.io.ByteStreams;
     import com.google.common.io.Closeables;
     
    +import org.apache.spark.SparkEnv;
     import org.apache.spark.serializer.SerializerManager;
     import org.apache.spark.storage.BlockId;
     import org.apache.spark.unsafe.Platform;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * Reads spill files written by {@link UnsafeSorterSpillWriter} (see that 
class for a description
      * of the file format).
      */
     public final class UnsafeSorterSpillReader extends UnsafeSorterIterator 
implements Closeable {
    +  private static final Logger logger = 
LoggerFactory.getLogger(UnsafeSorterSpillReader.class);
    +  private static final int DEFAULT_BUFFER_SIZE_BYTES = 1024 * 1024; // 1 MB
    --- End diff --
    
    @rxin : In response to [0], I have changed to 1 MB. As per my experiments, 
1 MB gave good perf and we are using it as default for all prod jobs.
    
    One concern / proposal: 
    
    With the change, UnsafeSorterSpillReader would consume more memory than 
before as the buffer would increase from 8k to 1 MB. Overall per 
UnsafeSorterSpillReader object footprint would grow from 2.5 MB to 3.6 MB (I 
have profiled to the number. See [1]). In case of job(s) which spill a lot, 
there would be lot of these spill readers created (in the screenshot, there 
were 400+ readers). Current merging approach is to open all the spill files at 
the same time and merge them all at once using a priority queue. Having lots of 
these objects in memory can lead to OOMs as there is no accounting for buffers 
allocated inside UnsafeSorterSpillReader (even without this change, snappy 
already had its own buffers for compressed and uncompressed data). Also, from 
disk point of view, having lots of file open at the same time would lead to 
random seeks and won't play well with OS's cache for disk reads. One might say 
that users should tune the job so that the spills are lesser but it might not 
be o
 bvious for people who do not understand the system internals. Also, for 
pipelines the data changes everyday and one setting may not work everytime. 
    
    Should we add some kinda hierarchical merging wherein spill files are 
iteratively merged in batches ? It could be turned on when there are say more 
than 100 spill files to be merged. AFAIK, Hadoop has this.
    
    [0] : https://github.com/apache/spark/pull/14475#discussion_r75440822
    [1] : https://postimg.org/image/cs5zr6lyx/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to