dataders opened a new pull request, #9951:
URL: https://github.com/apache/arrow-rs/pull/9951

   # Which issue does this PR close?
   
   - Contributes to #7136
   
   (See also the prior attempt #7137, which proposed the always-on form of this 
fix and was closed after maintainer feedback asking for an opt-in API. This PR 
implements exactly the [builder shape @alamb sketched in that 
thread](https://github.com/apache/arrow-rs/pull/7137#issuecomment-2706290700).)
   
   # Rationale for this change
   
   Some Arrow C Data Interface producers — notably the Go ADBC drivers in wide 
use today (e.g. the Snowflake driver) — emit data buffers whose pointers are 
not aligned to the underlying primitive type's alignment. When such buffers 
reach `arrow-rs`, downstream typed access in 
[`ScalarBuffer::from`](https://github.com/apache/arrow-rs/blob/main/arrow-buffer/src/buffer/scalar.rs#L184-L198)
 panics with `"Memory pointer from external source (e.g, FFI) is not aligned 
with the specified scalar type"`, taking the consumer process down.
   
   We hit this in production reading real Snowflake `VARCHAR` columns through 
the Go ADBC Snowflake driver into a Rust consumer, and we are not the only ones 
— see e.g. 
[apache/arrow-adbc#2526](https://github.com/apache/arrow-adbc/issues/2526) and 
the linked downstream reports.
   
   The previous attempt #7137 made `ArrowArrayStreamReader` unconditionally 
call `align_buffers()` on every imported batch. The discussion there (with 
@tustvold and @alamb) settled on:
   
   - silent reallocation hides producer bugs and changes behavior for consumers 
who would prefer a hard error;
   - but for consumers who **know** they're talking to a producer that emits 
unaligned data (e.g. a specific ADBC driver), they need a way to ask `arrow-rs` 
to fix it up rather than panicking deep in their kernels.
   
   @alamb's sketch:
   
   ```rust
   let stream_reader = ArrowArrayStreamReader::try_new(ffi_stream)
       .with_align_buffers(true);
   ```
   
   This PR implements that.
   
   # What changes are included in this PR?
   
   - Add a private `align_buffers: bool` field to `ArrowArrayStreamReader`, 
defaulting to `false` so the existing behavior is preserved bit-for-bit.
   - Add a public builder method 
`ArrowArrayStreamReader::with_align_buffers(self, bool) -> Self`.
   - In `Iterator::next`, when the flag is set, call 
`ArrayData::align_buffers()` on the freshly imported `ArrayData` before 
constructing the `RecordBatch`. `align_buffers` only copies buffers that are 
actually misaligned, so adequately aligned producers pay no cost beyond the 
per-buffer alignment check.
   
   The existing PR #7137's diff was 7 lines and changed default behavior. This 
PR is opt-in only.
   
   # Are these changes tested?
   
   Yes. Two new tests in `arrow-array/src/ffi_stream.rs`:
   
   - `test_unaligned_buffers_default_panics_on_typed_access` — fabricates an 
`FFI_ArrowArrayStream` whose first batch has a deliberately misaligned `Int32` 
data buffer (a one-byte-offset slice into an aligned backing buffer). Asserts 
via `#[should_panic]` that the default reader still panics on typed access, 
preserving the existing invariant that unaligned producers surface as 
detectable failures.
   - `test_unaligned_buffers_with_align_buffers_opt_in` — same input, reader 
configured with `with_align_buffers(true)`. Asserts the imported buffer's 
pointer is aligned to `align_of::<i32>()`, that typed access (`.values()`) does 
not panic, and that the stream is exhausted after the single batch.
   
   Both tests bypass the standard `FFI_ArrowArrayStream::new(reader)` helper 
because that helper materializes the source `RecordBatch` into typed arrays, 
which itself panics on misaligned input. Instead they build a minimal custom 
`FFI_ArrowArrayStream` whose `get_next` callback emits an `FFI_ArrowArray` 
constructed from an untyped `ArrayData` — closer to what a real foreign 
producer would do.
   
   \`\`\`
   $ cargo test -p arrow-array --features ffi
   test result: ok. 948 passed; 3 ignored; 0 failed
   
   $ cargo clippy -p arrow-array --all-features --tests
   (clean)
   
   $ cargo fmt -p arrow-array --check
   (clean)
   \`\`\`
   
   # Are there any user-facing changes?
   
   Yes, but **additive only**: one new public method on 
`ArrowArrayStreamReader`. No existing API or default behavior changes. No 
breaking changes.
   
   # Notes for reviewers
   
   - Marking as **draft** because the requesting team wants to eyeball it 
before un-drafting; happy to take any API-shape feedback (e.g. naming the 
method differently, defaulting differently, adding the same opt-in to other FFI 
import paths) before that.
   - The signature `with_align_buffers(mut self, bool) -> Self` mirrors the 
existing `ArrayDataBuilder::align_buffers(mut self, bool) -> Self` 
([data.rs:2222](https://github.com/apache/arrow-rs/blob/main/arrow-data/src/data.rs#L2222))
 for consistency. If you'd prefer a parameterless `with_align_buffers(self) -> 
Self`, easy change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to