Xi Lyu created SPARK-48359: ------------------------------ Summary: Built-in functions for Zstd compression and decompression Key: SPARK-48359 URL: https://issues.apache.org/jira/browse/SPARK-48359 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 4.0.0 Reporter: Xi Lyu
Some users are using UDFs for Zstd compression and decompression, which results in poor performance. If we provide native functions, the performance will be improved by compressing and decompressing just within the JVM. Now, we are introducing three new built-in functions: {code:java} zstd_compress(input: binary [, level: int [, steaming_mode: bool]]) zstd_decompress(input: binary) try_zstd_decompress(input: binary) {code} where * input: The binary value to compress or decompress. * level: Optional integer argument that represents the compression level. The compression level controls the trade-off between compression speed and compression ratio. The default level is 3. Valid values: between 1 and 22 inclusive * steaming_mode: Optional boolean argument that represents whether to use streaming mode to compress. Examples: {code:sql} > SELECT base64(zstd_compress(repeat("Apache Spark ", 10))); KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU= > SELECT base64(zstd_compress(repeat("Apache Spark ", 10), 3, true)); KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QU= > SELECT > string(zstd_decompress(unbase64("KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU="))); Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark > SELECT zstd_decompress(zstd_compress("Apache Spark")); Apache Spark > SELECT try_zstd_decompress("invalid input") NULL {code} These three built-in functions are also available in Python and Scala. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org