[ 
https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-7041:
------------------------------
    Summary: Avoid writing empty files in BypassMergeSortShuffleWriter  (was: 
Avoid writing empty files in ExternalSorter)

> Avoid writing empty files in BypassMergeSortShuffleWriter
> ---------------------------------------------------------
>
>                 Key: SPARK-7041
>                 URL: https://issues.apache.org/jira/browse/SPARK-7041
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> In ExternalSorter, we may end up opening disk writers files for empty 
> partitions; this occurs because we manually call {{open()}} after creating 
> the writer, causing serialization and compression input streams to be 
> created; these streams may write headers to the output stream, resulting in 
> non-zero-length files being created for partitions that contain no records.  
> This is unnecessary, though, since the disk object writer will automatically 
> open itself when the first write is performed.  Removing this eager 
> {{open()}} call and rewriting the consumers to cope with the non-existence of 
> empty files results in a large performance benefit for certain sparse 
> workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to