[ https://issues.apache.org/jira/browse/SPARK-20014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu resolved SPARK-20014. ---------------------------------- Resolution: Fixed Assignee: Sital Kedia Fix Version/s: 2.3.0 > Optimize mergeSpillsWithFileStream method > ----------------------------------------- > > Key: SPARK-20014 > URL: https://issues.apache.org/jira/browse/SPARK-20014 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.0.0 > Reporter: Sital Kedia > Assignee: Sital Kedia > Fix For: 2.3.0 > > > When the individual partition size in a spill is small, > mergeSpillsWithTransferTo method does many small disk ios which is really > inefficient. One way to improve the performance will be to use > mergeSpillsWithFileStream method by turning off transfer to and using > buffered file read/write to improve the io throughput. > However, the current implementation of mergeSpillsWithFileStream does not do > a buffer read/write of the files and in addition to that it unnecessarily > flushes the output files for each partitions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org