Takeshi Yamamuro created SPARK-4633: ---------------------------------------
Summary: Support gzip in spark.compression.io.codec Key: SPARK-4633 URL: https://issues.apache.org/jira/browse/SPARK-4633 Project: Spark Issue Type: Improvement Components: Input/Output Reporter: Takeshi Yamamuro Priority: Trivial gzip is widely used in other frameowrks such as hadoop mapreduce and tez, and also I think that gizip is more stable than other codecs in terms of both performance and space overheads. I have one open question; current spark configuratios have a block size option for each codec (spark.io.compression.[gzip|lz4|snappy].block.size). As # of codecs increases, the configurations have more options and I think that it is sort of complicated for non-expert users. To mitigate it, my thought follows; the three configurations are replaced with a single option for block size (spark.io.compression.block.size). Then, 'Meaning' in configurations will describe "This option makes an effect on gzip, lz4, and snappy. Block size (in bytes) used in compression, in the case when these compression codecs are used. Lowering...". -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org