Sital Kedia created SPARK-21113:
-----------------------------------

             Summary: Support for read ahead input stream to amortize disk IO 
cost in the Spill reader
                 Key: SPARK-21113
                 URL: https://issues.apache.org/jira/browse/SPARK-21113
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.0.2
            Reporter: Sital Kedia
            Priority: Minor


Profiling some of our big jobs, we see that around 30% of the time is being 
spent in reading the spill files from disk. In order to amortize the disk IO 
cost, the idea is to implement a read ahead input stream which which 
asynchronously reads ahead from the underlying input stream when specified 
amount of data has been read from the current buffer. It does it by maintaining 
two buffer - active buffer and read ahead buffer. Active buffer contains data 
which should be returned when a read() call is issued. The read ahead buffer is 
used to asynchronously read from the underlying input stream and once the 
current active buffer is exhausted, we flip the two buffers so that we can 
start reading from the read ahead buffer without being blocked in disk I/O.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to