mbutrovich opened a new issue, #10028:
URL: https://github.com/apache/arrow-rs/issues/10028

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   I am consuming Arrow data from a JVM producer (`arrow-java`'s 
`Data.exportArrayStream`) via `arrow::ffi_stream::ArrowArrayStreamReader`. When 
a batch contains a `Decimal128` column whose underlying buffer happens to land 
on an offset that is 8-byte aligned but not 16-byte aligned, 
`ArrowArrayStreamReader::next` panics inside 
`ScalarBuffer::<i128>::from(Buffer)`:
   
   ```
   panicked at arrow-buffer/src/buffer/scalar.rs:194:43:
   Memory pointer from external source (e.g, FFI) is not aligned with the 
specified scalar type.
   Before importing buffer through FFI, please make sure the allocation is 
aligned.
   ```
   
   The producer is spec-conformant. The Arrow C Data Interface only recommends 
8-byte alignment, and `arrow-java`'s `VectorUnloader` and 
`NettyAllocationManager` only guarantee 8-byte alignment. The mismatch is on 
the consumer side: since Rust 1.77 / LLVM 18, `align_of::<i128>() == 16` on x86 
(it has always been 16 on ARM), so `ScalarBuffer::<i128>` requires 16-byte 
alignment when constructing typed arrays from imported `ArrayData`.
   
   This is the same root cause as #5553 and PR #5554, which fixed it for the 
IPC reader by adding `IpcReadOptions::require_alignment` (triggering a 
realigning copy on import). The equivalent is missing from the C Data Interface 
readers.
   
   **Describe the solution you'd like**
   
   `ArrowArrayStreamReader` (and ideally `arrow::ffi::from_ffi` / 
`arrow::ffi::FFI_ArrowArray` import paths) should either:
   
   1. Default to calling `ArrayData::align_buffers()` on every imported array 
before handing it to typed-array construction, or
   2. Expose an option (mirroring `IpcReadOptions::require_alignment`) that 
opts into the realigning copy.
   
   The helper already exists (`ArrayData::align_buffers()`) and walks child 
data recursively, so the fix is just to invoke it inside the 
`ArrowArrayStreamReader::next` path between `from_ffi` and 
`RecordBatch::from(StructArray::from(...))`.
   
   The reader-side IPC behavior introduced in #5554 sets the precedent that the 
consumer is responsible for repairing under-aligned imports when the source's 
alignment guarantee is weaker than what arrow-rs's typed arrays require. The 
same logic applies to FFI imports.
   
   **Describe alternatives you've considered**
   
   1. Forcing the JVM producer to allocate decimal buffers with 16-byte 
alignment. Not portable: there is no alignment hook on `arrow-java`'s 
`BufferAllocator` / `NettyAllocationManager`, and the spec only requires 8-byte 
alignment of the producer.
   2. Wrapping `ArrowArrayStreamReader` in user code by replicating its 
internals (driving `FFI_ArrowArrayStream::get_next` directly, calling 
`from_ffi`, then `align_buffers()`, then building the typed batch). Workable 
but duplicates arrow-rs internals; every JVM-Arrow consumer hits this and ends 
up writing the same wrapper.
   3. Realigning post-import. Not possible from outside the reader because the 
panic happens inside `ArrowArrayStreamReader::next` before the caller sees a 
`RecordBatch`.
   
   **Additional context**
   
   Related:
   - #5553 / #5554: same root cause, fixed for IPC.
   - #2882 / #2883 / #2884: earlier discussion of buffer alignment on import.
   - `ArrayData::align_buffers()` already implements the fix; it just needs to 
be invoked from the FFI import paths.
   
   Reproducer shape: any JVM producer that exports a `RecordBatch` containing a 
`Decimal128` column (or `List<Decimal128>` / `Struct<..., Decimal128>`) where 
the data buffer offset within its slab is `8 mod 16`. Triggers ~50% of the time 
with `arrow-java`'s default `NettyAllocationManager`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to