goutamadwant opened a new pull request, #10230: URL: https://github.com/apache/arrow-rs/pull/10230
# Which issue does this PR close? - Closes #10213. # Rationale for this change Arrow IPC dictionary batches write a dictionary field's values as the batch payload. When those values are themselves dictionary-encoded, the writer can produce IPC data that readers cannot decode, failing later with a buffer metadata mismatch. # What changes are included in this PR? This adds schema validation to the IPC file and stream writer constructors so direct dictionary-of-dictionary schemas return a clear `InvalidArgumentError` before any IPC bytes are written. A low-level dictionary encoding guard is kept as a backstop. A regression test covers the direct `Dictionary(_, Dictionary(_, _))` case for both IPC stream and file writers. # Are these changes tested? Yes. - `cargo fmt --all -- --check` - `cargo test -p arrow-ipc` - `cargo test -p arrow-ipc --all-features` - `cargo clippy -p arrow-ipc --all-targets -- -D warnings` - `cargo clippy -p arrow-ipc --all-targets --all-features -- -D warnings` # Are there any user-facing changes? Yes. The IPC file and stream writers now return a clear error for direct dictionary-of-dictionary schemas instead of writing IPC data that fails during read. There are no public API changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
