[ https://issues.apache.org/jira/browse/SPARK-21867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288363#comment-16288363 ]
Apache Spark commented on SPARK-21867: -------------------------------------- User 'ericvandenbergfb' has created a pull request for this issue: https://github.com/apache/spark/pull/19955 > Support async spilling in UnsafeShuffleWriter > --------------------------------------------- > > Key: SPARK-21867 > URL: https://issues.apache.org/jira/browse/SPARK-21867 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.2.0 > Reporter: Sital Kedia > Priority: Minor > Attachments: Async ShuffleExternalSorter.pdf > > > Currently, Spark tasks are single-threaded. But we see it could greatly > improve the performance of the jobs, if we can multi-thread some part of it. > For example, profiling our map tasks, which reads large amount of data from > HDFS and spill to disks, we see that we are blocked on HDFS read and spilling > majority of the time. Since both these operations are IO intensive the > average CPU consumption during map phase is significantly low. In theory, > both HDFS read and spilling can be done in parallel if we had additional > memory to store data read from HDFS while we are spilling the last batch read. > Let's say we have 1G of shuffle memory available per task. Currently, in case > of map task, it reads from HDFS and the records are stored in the available > memory buffer. Once we hit the memory limit and there is no more space to > store the records, we sort and spill the content to disk. While we are > spilling to disk, since we do not have any available memory, we can not read > from HDFS concurrently. > Here we propose supporting async spilling for UnsafeShuffleWriter, so that we > can support reading from HDFS when sort and spill is happening > asynchronously. Let's say the total 1G of shuffle memory can be split into > two regions - active region and spilling region - each of size 500 MB. We > start with reading from HDFS and filling the active region. Once we hit the > limit of active region, we issue an asynchronous spill, while fliping the > active region and spilling region. While the spil is happening > asynchronosuly, we still have 500 MB of memory available to read the data > from HDFS. This way we can amortize the high disk/network io cost during > spilling. > We made a prototype hack to implement this feature and we could see our map > tasks were as much as 40% faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org