[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18805 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r148184755 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + val bufferSize = conf.getSizeAsBytes("spark.io.compression.zstd.bufferSize", "32k").toInt --- End diff -- done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r147629043 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + val bufferSize = conf.getSizeAsBytes("spark.io.compression.zstd.bufferSize", "32k").toInt --- End diff -- This should be private. The intent was to lift both config values out of the method, so level can do here too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r147618796 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt --- End diff -- Good eye, fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r147340655 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt --- End diff -- Is this `getInt` instead of `getSizeAsBytes`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r147215825 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt +val bufferSize = conf.getSizeAsBytes("spark.io.compression.zstd.bufferSize", "32k").toInt --- End diff -- Sorry somehow missed these comments. Will address. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r147213697 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt +val bufferSize = conf.getSizeAsBytes("spark.io.compression.zstd.bufferSize", "32k").toInt --- End diff -- @sitalkedia how about comments like this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r139898945 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt +val bufferSize = conf.getSizeAsBytes("spark.io.compression.zstd.bufferSize", "32k").toInt --- End diff -- Agree, it's simpler and cleaner, as it avoids duplicating this property in this file --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r139635152 --- Diff: LICENSE --- @@ -269,6 +269,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt. (BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE) (BSD 3 Clause) DPark (https://github.com/douban/dpark/blob/master/LICENSE) (BSD 3 Clause) CloudPickle (https://github.com/cloudpipe/cloudpickle/blob/master/LICENSE) + (BSD 2 Clause) Zstd-jni (https://github.com/luben/zstd-jni/blob/master/LICENSE) --- End diff -- ZStandard itself is also included, not just zstd-jni. We'd need an entry for it, and its LICENSE as below. https://github.com/luben/zstd-jni/tree/master/src/main/native --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r139634333 --- Diff: dev/deps/spark-deps-hadoop-2.6 --- @@ -186,3 +186,4 @@ xercesImpl-2.9.1.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.jar +zstd-jni-1.3.0-1.jar --- End diff -- These need to be updated, it seems --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130892528 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt +val bufferSize = conf.getSizeAsBytes("spark.io.compression.zstd.bufferSize", "32k").toInt --- End diff -- Would it be better to have this variable as a private variable to get this property only once? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130889998 --- Diff: docs/configuration.md --- @@ -886,6 +887,23 @@ Apart from these, the following properties are also available, and may be useful + spark.io.compression.zstd.level + 1 + +Compression leve for Zstd compression codec. Increasing the compression level will result in better --- End diff -- nit: leve -> level --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130889423 --- Diff: docs/configuration.md --- @@ -866,7 +866,8 @@ Apart from these, the following properties are also available, and may be useful e.g. org.apache.spark.io.LZ4CompressionCodec, org.apache.spark.io.LZFCompressionCodec, -and org.apache.spark.io.SnappyCompressionCodec. +org.apache.spark.io.SnappyCompressionCodec. --- End diff -- nit: '.' -> ',' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130787262 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -50,13 +51,14 @@ private[spark] object CompressionCodec { private[spark] def supportsConcatenationOfSerializedStreams(codec: CompressionCodec): Boolean = { (codec.isInstanceOf[SnappyCompressionCodec] || codec.isInstanceOf[LZFCompressionCodec] - || codec.isInstanceOf[LZ4CompressionCodec]) + || codec.isInstanceOf[LZ4CompressionCodec] || codec.isInstanceOf[ZStandardCompressionCodec]) } private val shortCompressionCodecNames = Map( "lz4" -> classOf[LZ4CompressionCodec].getName, "lzf" -> classOf[LZFCompressionCodec].getName, -"snappy" -> classOf[SnappyCompressionCodec].getName) +"snappy" -> classOf[SnappyCompressionCodec].getName, +"zstd" -> classOf[SnappyCompressionCodec].getName) --- End diff -- Ah, my bad. Fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130787287 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,30 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStandardCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +val level = conf.getSizeAsBytes("spark.io.compression.zstandard.level", "1").toInt --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130787269 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,30 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130787205 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,30 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStandardCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +val level = conf.getSizeAsBytes("spark.io.compression.zstandard.level", "1").toInt +val compressionBuffer = conf.getSizeAsBytes("spark.io.compression.lz4.blockSize", "32k").toInt --- End diff -- You are right, we should not share the config with lz4, created a new one. Lets keep the default to 32kb which is aligned with the block size used by other compressions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130769482 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -50,13 +51,14 @@ private[spark] object CompressionCodec { private[spark] def supportsConcatenationOfSerializedStreams(codec: CompressionCodec): Boolean = { (codec.isInstanceOf[SnappyCompressionCodec] || codec.isInstanceOf[LZFCompressionCodec] - || codec.isInstanceOf[LZ4CompressionCodec]) + || codec.isInstanceOf[LZ4CompressionCodec] || codec.isInstanceOf[ZStandardCompressionCodec]) } private val shortCompressionCodecNames = Map( "lz4" -> classOf[LZ4CompressionCodec].getName, "lzf" -> classOf[LZFCompressionCodec].getName, -"snappy" -> classOf[SnappyCompressionCodec].getName) +"snappy" -> classOf[SnappyCompressionCodec].getName, +"zstd" -> classOf[SnappyCompressionCodec].getName) --- End diff -- you mean `ZStandardCompressionCodec` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130769858 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,30 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStandardCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +val level = conf.getSizeAsBytes("spark.io.compression.zstandard.level", "1").toInt +val compressionBuffer = conf.getSizeAsBytes("spark.io.compression.lz4.blockSize", "32k").toInt --- End diff -- - wondering if we should share this config value OR have a new one. - do you want to set the default to something higher like 1mb or 4mb ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130769646 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,30 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStandardCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +val level = conf.getSizeAsBytes("spark.io.compression.zstandard.level", "1").toInt --- End diff -- please add a comment explaining the reason why we chose level 1 over other levels --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r130769548 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,30 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. --- End diff -- would be good to add this link to the spec : http://facebook.github.io/zstd/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
GitHub user sitalkedia opened a pull request: https://github.com/apache/spark/pull/18805 [SPARK-19112][CORE] Support for ZStandard codec ## What changes were proposed in this pull request? Using zstd compression for Spark jobs spilling 100s of TBs of data, we could reduce the amount of data written to disk by as much as 50%. This translates to significant latency gain because of reduced disk io operations. There is a degradation CPU time by 2 - 5% because of zstd compression overhead, but for jobs which are bottlenecked by disk IO, this hit can be taken. ## How was this patch tested? Tested by running few jobs spilling large amount of data on the cluster and amount of intermediate data written to disk reduced by as much as 50%. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sitalkedia/spark skedia/upstream_zstd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18805.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18805 commit cff558b6873a8ec184159a9df3c1e83c9cd0a6e7 Author: Sital KediaDate: 2017-08-02T00:41:27Z [SPARK-19112][CORE] Support for ZStandard codec --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org