[ 
https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109784#comment-16109784
 ] 

Sital Kedia commented on SPARK-19112:
-------------------------------------

[~sowen], [~tgraves] - Using zstd compression for our Spark jobs spilling 100s 
of TBs of data, we could reduce the amount of data written to disk by as much 
as 50%. This translates to significant latency gain because of reduced disk io 
operations. There is a degradation CPU time by 2  - 5% because of zstd 
compression overhead, but for jobs which are bottlenecked by disk IO, this hit 
can be taken. We are going to enable Zstd as default compression for all of our 
jobs. We should support zstd in open source Spark as well. I will reopen the PR 
with some minor changes. 

> add codec for ZStandard
> -----------------------
>
>                 Key: SPARK-19112
>                 URL: https://issues.apache.org/jira/browse/SPARK-19112
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Thomas Graves
>            Priority: Minor
>
> ZStandard: https://github.com/facebook/zstd and 
> http://facebook.github.io/zstd/ has been in use for a while now. v1.0 was 
> recently released. Hadoop 
> (https://issues.apache.org/jira/browse/HADOOP-13578) and others 
> (https://issues.apache.org/jira/browse/KAFKA-4514) are adopting it.
> Zstd seems to give great results => Gzip level Compression with Lz4 level CPU.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to