[jira] [Updated] (SPARK-2496) Compression streams should write its codec info to the stream

Reynold Xin (JIRA) Tue, 15 Jul 2014 15:27:22 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-2496:
-------------------------------

    Component/s: Spark Core
                 Shuffle

> Compression streams should write its codec info to the stream
> -------------------------------------------------------------
>
>                 Key: SPARK-2496
>                 URL: https://issues.apache.org/jira/browse/SPARK-2496
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Priority: Critical
>
> Spark sometime store compressed data outside of Spark (e.g. event logs, 
> blocks in tachyon), and those data are read back directly using the codec 
> configured by the user. When the codec differs between runs, Spark wouldn't 
> be able to read the codec back. 
> I'm not sure what the best strategy here is yet. If we write the codec 
> identifier for all streams, then we will be writing a lot of identifiers for 
> shuffle blocks. One possibility is to only write it for blocks that will be 
> shared across different Spark instances (i.e. managed outside of Spark), 
> which includes tachyon blocks and event log blocks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-2496) Compression streams should write its codec info to the stream

Reply via email to