xi-db opened a new pull request, #46672:
URL: https://github.com/apache/spark/pull/46672
### What changes were proposed in this pull request?
Some users are using UDFs for Zstd compression and decompression, which
results in poor performance. If we provide native functions, the performance
will be improved by compressing and decompressing just within the JVM.
Now, we are introducing three new built-in functions:
```
zstd_compress(input: binary [, level: int [, steaming_mode: bool]])
zstd_decompress(input: binary)
try_zstd_decompress(input: binary)
```
where
* `input`: The binary value to compress or decompress.
* `level`: Optional integer argument that represents the compression level.
The compression level controls the trade-off between compression speed and
compression ratio. The default level is 3. Valid values: between 1 and 22
inclusive
* `streaming_mode`: Optional boolean argument that represents whether to use
streaming mode to compress.
Examples:
```
> SELECT base64(zstd_compress(repeat("Apache Spark ", 10)));
KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=
> SELECT base64(zstd_compress(repeat("Apache Spark ", 10), 3, true));
KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QUBAAA=
> SELECT
string(zstd_decompress(unbase64("KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=")));
Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache
Spark Apache Spark Apache Spark Apache Spark Apache Spark
> SELECT zstd_decompress(zstd_compress("Apache Spark"));
Apache Spark
> SELECT try_zstd_decompress("invalid input")
NULL
```
These three built-in functions are also available in Python and Scala.
### Why are the changes needed?
Users no longer need to use UDFs for Zstd compression and decompression;
they can directly use built-in SQL functions to run within the JVM.
### Does this PR introduce _any_ user-facing change?
Yes, three SQL functions - `zstd_compress`, `zstd_decompress`, and
`try_zstd_decompress` are introduced.
### How was this patch tested?
Added new UT and E2E tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]