maltzsama opened a new pull request, #144:
URL: https://github.com/apache/arrow-swift/pull/144
Closes #106
## What
`alignTo64` was adding 8 extra bytes to every buffer allocation, regardless
of whether the length was already aligned:
```swift
// before
private static func alignTo64(_ length: UInt) -> UInt {
let bufAlignment = length % 64
if bufAlignment != 0 {
return length + (64 - bufAlignment) + 8 // +8 has no spec basis
}
return length + 8 // +8 even when already aligned
}
// after
private static func alignTo64(_ length: UInt) -> UInt {
let bufAlignment = length % 64
if bufAlignment != 0 {
return length + (64 - bufAlignment)
}
return length
}
```
## Why
The Arrow columnar format specification requires buffer alignment to 64
bytes.
There is no provision in the spec for additional padding beyond that
alignment
boundary. The extra 8 bytes were silently inflating every buffer allocation —
both the allocated size and the `capacity` field used by `append(to:)` when
writing IPC messages.
The extra bytes are zeros (buffers are zero-initialized), so they have no
effect on correctness when reading IPC streams — readers skip padding between
messages at the IPC layer, not at the buffer layer. But they do produce
non-conformant buffer sizes that diverge from what other Arrow
implementations
produce for the same input, which can surface as subtle interop issues when
exchanging data via the C Data Interface with Go or other runtimes.
The most likely origin of this `+8` is a confusion with the IPC message
framing
format, which uses a 4-byte continuation indicator followed by a 4-byte
metadata
length prefix before each message body. That framing is the responsibility of
`ArrowWriter`, not `ArrowBuffer`.
## Impact
Every `ArrowBuffer` created via `createBuffer` was 8 bytes larger than
necessary. In workloads with many columns or high message throughput —
exactly
the scenarios Arrow is designed for — this overhead accumulates across every
batch. The fix brings allocation sizes into conformance with the spec.
No existing tests assert on `capacity` values directly, so no test changes
are
required. The behavior of all existing tests is unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]