Xi Lyu created SPARK-48359:
------------------------------

             Summary: Built-in functions for Zstd compression and decompression
                 Key: SPARK-48359
                 URL: https://issues.apache.org/jira/browse/SPARK-48359
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 4.0.0
            Reporter: Xi Lyu


Some users are using UDFs for Zstd compression and decompression, which results 
in poor performance. If we provide native functions, the performance will be 
improved by compressing and decompressing just within the JVM.

 

Now, we are introducing three new built-in functions:
{code:java}
zstd_compress(input: binary [, level: int [, steaming_mode: bool]])

zstd_decompress(input: binary)

try_zstd_decompress(input: binary)
{code}
where
 * input: The binary value to compress or decompress.
 * level: Optional integer argument that represents the compression level. The 
compression level controls the trade-off between compression speed and 
compression ratio. The default level is 3. Valid values: between 1 and 22 
inclusive
 * steaming_mode: Optional boolean argument that represents whether to use 
streaming mode to compress. 

Examples:
{code:sql}
> SELECT base64(zstd_compress(repeat("Apache Spark ", 10)));
  KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=
> SELECT base64(zstd_compress(repeat("Apache Spark ", 10), 3, true));
  KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QU=
> SELECT 
> string(zstd_decompress(unbase64("KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=")));
  Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark 
Apache Spark Apache Spark Apache Spark Apache Spark
> SELECT zstd_decompress(zstd_compress("Apache Spark"));
  Apache Spark
> SELECT try_zstd_decompress("invalid input")
  NULL
{code}
These three built-in functions are also available in Python and Scala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to