mbutrovich opened a new issue, #10034:
URL: https://github.com/apache/arrow-rs/issues/10034
Follow-up to #10030 / #10028.
## Describe the bug
`arrow::ffi::from_ffi` and `from_ffi_and_data_type` call
`data.align_buffers()` after `ImportedArrowArray::consume()` to realign
protocol-legal but under-aligned C Data Interface buffers (e.g. 8-byte-aligned
`Decimal128` from a JVM producer). Under `cfg(feature = "force_validate")` this
realignment is unreachable: `consume()` constructs `ArrayData` via
`ArrayData::new_unchecked`, which goes through `ArrayDataBuilder::build`. That
builder runs `validate_data()` whenever `force_validate` is enabled, regardless
of `skip_validation`:
```rust
// arrow-data/src/data.rs:2195
if !skip_validation.get() || cfg!(feature = "force_validate") {
data.validate_data()?;
}
```
`validate()` rejects misaligned `BufferSpec::FixedWidth` buffers
(`arrow-data/src/data.rs:842-867`), so `consume()?` returns
`InvalidArgumentError` before `from_ffi`'s `align_buffers()` runs. The same
applies recursively through `consume_children` / `consume_child`.
The new regression test added in #10030
(`test_decimal128_under_aligned_round_trip`) is gated `#[cfg(not(feature =
"force_validate"))]`, so CI under that feature does not surface the gap.
This is not a regression introduced by #10030 — prior to that PR,
`force_validate` builds also returned an error rather than panicking. But the
inline comment "a no-op when buffers are already aligned" reads as if alignment
is handled in all builds, which is misleading.
## To Reproduce
Build with `--features force_validate` and run the round-trip with an
8-byte-aligned-but-not-16-byte-aligned `Decimal128` buffer through `from_ffi`.
`consume()` returns `InvalidArgumentError` before realignment can run.
## Expected behavior
`from_ffi` accepts spec-conformant C Data Interface input regardless of
`force_validate`. 8-byte alignment is legal over the C Data Interface; it only
becomes an arrow-rs `ArrayData` invariant violation after import. Today the
order is "validate, then realign", which treats protocol-legal input as an
invariant violation. The correct order on the import boundary is "realign, then
validate".
## Proposed fix
Push alignment into `ImportedArrowArray::consume` (and recursively through
`consume_child`) by replacing `ArrayData::new_unchecked` with the builder form:
```rust
ArrayDataBuilder::new(self.data_type)
.len(len)
.offset(offset)
.null_count(null_count)
.null_bit_buffer(null_bit_buffer)
.buffers(buffers)
.child_data(child_data)
.align_buffers(true)
.skip_validation(true)
.build()?
```
`ArrayDataBuilder::build` runs `align_buffers` before `validate_data`
(`arrow-data/src/data.rs:2190-2197`), so this works correctly under
`force_validate` while preserving the existing intent of skipping FFI
validation on the hot path.
Then:
- Drop the now-redundant outer `data.align_buffers()` calls in `from_ffi` /
`from_ffi_and_data_type` to avoid a duplicate recursive scan.
- Remove the `#[cfg(not(feature = "force_validate"))]` gate on
`test_decimal128_under_aligned_round_trip` so the case is covered under that
feature too.
It would also be worth tightening the doc on `ArrayData::align_buffers` to
note that on import boundaries it must be called *before* validation; the
current doc only says it's "useful when interacting with buffers produced by
other systems" without addressing ordering.
## Additional context
Raised in review of #10030:
https://github.com/apache/arrow-rs/pull/10030#pullrequestreview-4383084253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]