[ 
https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7041:
-----------------------------------
    Target Version/s: 1.5.0  (was: 1.4.0)

> Avoid writing empty files in BypassMergeSortShuffleWriter
> ---------------------------------------------------------
>
>                 Key: SPARK-7041
>                 URL: https://issues.apache.org/jira/browse/SPARK-7041
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> In BypassMergeSortShuffleWriter, we may end up opening disk writers files for 
> empty partitions; this occurs because we manually call {{open()}} after 
> creating the writer, causing serialization and compression input streams to 
> be created; these streams may write headers to the output stream, resulting 
> in non-zero-length files being created for partitions that contain no 
> records.  This is unnecessary, though, since the disk object writer will 
> automatically open itself when the first write is performed.  Removing this 
> eager {{open()}} call and rewriting the consumers to cope with the 
> non-existence of empty files results in a large performance benefit for 
> certain sparse workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to