andraztori opened a new pull request, #9748:
URL: https://github.com/apache/arrow-rs/pull/9748
# Which issue does this PR close?
<!-- Not linked to an existing issue. Happy to file one first if preferred.
-->
- Closes #NNN.
# Rationale for this change
`arrow-ipc` currently hardcodes zstd to `zstd::DEFAULT_COMPRESSION_LEVEL`
(level 3). Users who want tighter compression (for cold storage / WAN transfer)
or faster compression (for hot paths) have no way to tune this without forking
the crate.
`parquet::basic::Compression::ZSTD(ZstdLevel)` already exposes the exact
same knob, so users writing both Parquet and IPC get an inconsistent experience
today.
This PR adds configurable zstd compression levels to `arrow-ipc`, mirroring
the parquet API as closely as possible so the two stay familiar side-by-side.
# What changes are included in this PR?
- New `arrow_ipc::compression::ZstdLevel(i32)` — validated newtype matching
the shape of `parquet::basic::ZstdLevel` (same range `1..=22`, same `try_new` /
`compression_level()` / `Default`).
- New `arrow_ipc::compression::IpcCompression` enum — writer-side codec +
parameter selector, analogous to `parquet::basic::Compression`:
```rust
pub enum IpcCompression {
Lz4Frame,
Zstd(ZstdLevel),
}
```
- `IpcWriteOptions::try_with_compression` now takes `Option<IpcCompression>`
instead of `Option<CompressionType>` (**source-breaking change**, see below).
- `CompressionContext::with_zstd_level(ZstdLevel)` constructor; `FileWriter`
/ `StreamWriter` build their context via the configured level instead of the
hardcoded default.
- `ZstdLevel` and `IpcCompression` are re-exported from `arrow_ipc::writer`
so the public surface stays in one place.
On-wire format is unchanged — the IPC flatbuffer `BodyCompression.codec`
enum is 1:1 with the wire codec; the zstd level is a purely writer-side
parameter (decoders do not need to know it, same as in parquet).
# Are these changes tested?
Yes:
- `test_write_file_with_zstd_non_default_level` — writes a record batch at a
non-default zstd level through the public `FileWriter` API and reads it back
with the stock `FileReader`, verifying identity.
- Existing zstd round-trip / compression tests continue to pass
(`test_write_file_with_zstd_compression`, etc.).
- All in-crate callers (`arrow-ipc` tests/benches,
`arrow-integration-testing`) updated to the new `IpcCompression` type.
Verified locally with `cargo fmt`, `cargo build -p arrow-ipc
--all-features`, `cargo test -p arrow-ipc --all-features` (107 unit tests +
doctests pass), and builds of `arrow-flight` / `arrow-integration-testing`.
# Are there any user-facing changes?
Yes — one **source-breaking change** to a public API:
```rust
// Before:
pub fn try_with_compression(self, batch_compression:
Option<CompressionType>) -> Result<Self, ArrowError>
// After:
pub fn try_with_compression(self, batch_compression: Option<IpcCompression>)
-> Result<Self, ArrowError>
```
Call-site migration:
```rust
// Before
.try_with_compression(Some(CompressionType::ZSTD))?
.try_with_compression(Some(CompressionType::LZ4_FRAME))?
// After
.try_with_compression(Some(IpcCompression::zstd_default()))? //
same behavior as before
.try_with_compression(Some(IpcCompression::Zstd(ZstdLevel::try_new(9)?)))?
// new: non-default level
.try_with_compression(Some(IpcCompression::Lz4Frame))?
```
Because this is a breaking change, it should land in the next major release
(59.0.0). Happy to gate or defer if maintainers prefer.
---
### Disclosure
This PR was drafted with AI assistance (Cursor / Anthropic Claude). All code
has been reviewed, built, tested, and formatted locally by me. The design was
chosen to mirror existing `parquet` crate conventions; no LLM-authored code was
committed without review.
Made with [Cursor](https://cursor.com)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]