[ 
https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051134#comment-16051134
 ] 

Apache Spark commented on SPARK-21113:
--------------------------------------

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/18317

> Support for read ahead input stream to amortize disk IO cost in the Spill 
> reader
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-21113
>                 URL: https://issues.apache.org/jira/browse/SPARK-21113
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.2
>            Reporter: Sital Kedia
>            Priority: Minor
>
> Profiling some of our big jobs, we see that around 30% of the time is being 
> spent in reading the spill files from disk. In order to amortize the disk IO 
> cost, the idea is to implement a read ahead input stream which which 
> asynchronously reads ahead from the underlying input stream when specified 
> amount of data has been read from the current buffer. It does it by 
> maintaining two buffer - active buffer and read ahead buffer. Active buffer 
> contains data which should be returned when a read() call is issued. The read 
> ahead buffer is used to asynchronously read from the underlying input stream 
> and once the current active buffer is exhausted, we flip the two buffers so 
> that we can start reading from the read ahead buffer without being blocked in 
> disk I/O.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to