mbutrovich opened a new issue, #10028: URL: https://github.com/apache/arrow-rs/issues/10028
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I am consuming Arrow data from a JVM producer (`arrow-java`'s `Data.exportArrayStream`) via `arrow::ffi_stream::ArrowArrayStreamReader`. When a batch contains a `Decimal128` column whose underlying buffer happens to land on an offset that is 8-byte aligned but not 16-byte aligned, `ArrowArrayStreamReader::next` panics inside `ScalarBuffer::<i128>::from(Buffer)`: ``` panicked at arrow-buffer/src/buffer/scalar.rs:194:43: Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type. Before importing buffer through FFI, please make sure the allocation is aligned. ``` The producer is spec-conformant. The Arrow C Data Interface only recommends 8-byte alignment, and `arrow-java`'s `VectorUnloader` and `NettyAllocationManager` only guarantee 8-byte alignment. The mismatch is on the consumer side: since Rust 1.77 / LLVM 18, `align_of::<i128>() == 16` on x86 (it has always been 16 on ARM), so `ScalarBuffer::<i128>` requires 16-byte alignment when constructing typed arrays from imported `ArrayData`. This is the same root cause as #5553 and PR #5554, which fixed it for the IPC reader by adding `IpcReadOptions::require_alignment` (triggering a realigning copy on import). The equivalent is missing from the C Data Interface readers. **Describe the solution you'd like** `ArrowArrayStreamReader` (and ideally `arrow::ffi::from_ffi` / `arrow::ffi::FFI_ArrowArray` import paths) should either: 1. Default to calling `ArrayData::align_buffers()` on every imported array before handing it to typed-array construction, or 2. Expose an option (mirroring `IpcReadOptions::require_alignment`) that opts into the realigning copy. The helper already exists (`ArrayData::align_buffers()`) and walks child data recursively, so the fix is just to invoke it inside the `ArrowArrayStreamReader::next` path between `from_ffi` and `RecordBatch::from(StructArray::from(...))`. The reader-side IPC behavior introduced in #5554 sets the precedent that the consumer is responsible for repairing under-aligned imports when the source's alignment guarantee is weaker than what arrow-rs's typed arrays require. The same logic applies to FFI imports. **Describe alternatives you've considered** 1. Forcing the JVM producer to allocate decimal buffers with 16-byte alignment. Not portable: there is no alignment hook on `arrow-java`'s `BufferAllocator` / `NettyAllocationManager`, and the spec only requires 8-byte alignment of the producer. 2. Wrapping `ArrowArrayStreamReader` in user code by replicating its internals (driving `FFI_ArrowArrayStream::get_next` directly, calling `from_ffi`, then `align_buffers()`, then building the typed batch). Workable but duplicates arrow-rs internals; every JVM-Arrow consumer hits this and ends up writing the same wrapper. 3. Realigning post-import. Not possible from outside the reader because the panic happens inside `ArrowArrayStreamReader::next` before the caller sees a `RecordBatch`. **Additional context** Related: - #5553 / #5554: same root cause, fixed for IPC. - #2882 / #2883 / #2884: earlier discussion of buffer alignment on import. - `ArrayData::align_buffers()` already implements the fix; it just needs to be invoked from the FFI import paths. Reproducer shape: any JVM producer that exports a `RecordBatch` containing a `Decimal128` column (or `List<Decimal128>` / `Struct<..., Decimal128>`) where the data buffer offset within its slab is `8 mod 16`. Triggers ~50% of the time with `arrow-java`'s default `NettyAllocationManager`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
