mbutrovich opened a new issue, #10034:
URL: https://github.com/apache/arrow-rs/issues/10034

   Follow-up to #10030 / #10028.
   
   ## Describe the bug
   
   `arrow::ffi::from_ffi` and `from_ffi_and_data_type` call 
`data.align_buffers()` after `ImportedArrowArray::consume()` to realign 
protocol-legal but under-aligned C Data Interface buffers (e.g. 8-byte-aligned 
`Decimal128` from a JVM producer). Under `cfg(feature = "force_validate")` this 
realignment is unreachable: `consume()` constructs `ArrayData` via 
`ArrayData::new_unchecked`, which goes through `ArrayDataBuilder::build`. That 
builder runs `validate_data()` whenever `force_validate` is enabled, regardless 
of `skip_validation`:
   
   ```rust
   // arrow-data/src/data.rs:2195
   if !skip_validation.get() || cfg!(feature = "force_validate") {
       data.validate_data()?;
   }
   ```
   
   `validate()` rejects misaligned `BufferSpec::FixedWidth` buffers 
(`arrow-data/src/data.rs:842-867`), so `consume()?` returns 
`InvalidArgumentError` before `from_ffi`'s `align_buffers()` runs. The same 
applies recursively through `consume_children` / `consume_child`.
   
   The new regression test added in #10030 
(`test_decimal128_under_aligned_round_trip`) is gated `#[cfg(not(feature = 
"force_validate"))]`, so CI under that feature does not surface the gap.
   
   This is not a regression introduced by #10030 — prior to that PR, 
`force_validate` builds also returned an error rather than panicking. But the 
inline comment "a no-op when buffers are already aligned" reads as if alignment 
is handled in all builds, which is misleading.
   
   ## To Reproduce
   
   Build with `--features force_validate` and run the round-trip with an 
8-byte-aligned-but-not-16-byte-aligned `Decimal128` buffer through `from_ffi`. 
`consume()` returns `InvalidArgumentError` before realignment can run.
   
   ## Expected behavior
   
   `from_ffi` accepts spec-conformant C Data Interface input regardless of 
`force_validate`. 8-byte alignment is legal over the C Data Interface; it only 
becomes an arrow-rs `ArrayData` invariant violation after import. Today the 
order is "validate, then realign", which treats protocol-legal input as an 
invariant violation. The correct order on the import boundary is "realign, then 
validate".
   
   ## Proposed fix
   
   Push alignment into `ImportedArrowArray::consume` (and recursively through 
`consume_child`) by replacing `ArrayData::new_unchecked` with the builder form:
   
   ```rust
   ArrayDataBuilder::new(self.data_type)
       .len(len)
       .offset(offset)
       .null_count(null_count)
       .null_bit_buffer(null_bit_buffer)
       .buffers(buffers)
       .child_data(child_data)
       .align_buffers(true)
       .skip_validation(true)
       .build()?
   ```
   
   `ArrayDataBuilder::build` runs `align_buffers` before `validate_data` 
(`arrow-data/src/data.rs:2190-2197`), so this works correctly under 
`force_validate` while preserving the existing intent of skipping FFI 
validation on the hot path.
   
   Then:
   - Drop the now-redundant outer `data.align_buffers()` calls in `from_ffi` / 
`from_ffi_and_data_type` to avoid a duplicate recursive scan.
   - Remove the `#[cfg(not(feature = "force_validate"))]` gate on 
`test_decimal128_under_aligned_round_trip` so the case is covered under that 
feature too.
   
   It would also be worth tightening the doc on `ArrayData::align_buffers` to 
note that on import boundaries it must be called *before* validation; the 
current doc only says it's "useful when interacting with buffers produced by 
other systems" without addressing ordering.
   
   ## Additional context
   
   Raised in review of #10030: 
https://github.com/apache/arrow-rs/pull/10030#pullrequestreview-4383084253
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to