Rich-T-kid commented on PR #10206:
URL: https://github.com/apache/arrow-rs/pull/10206#issuecomment-4785443780
Nice, decode path shows a large performance boost. I think the largest
performance boost is finding a way to align the buffers that are being decoded
without needing to copy every single buffer.
This is difficult because of the lack of control we have over the byte
offset that is handed to us by tonic. looking at
```
pub fn align_buffers(&mut self) {
let layout = layout(&self.data_type);
for (buffer, spec) in self.buffers.iter_mut().zip(&layout.buffers) {
if let BufferSpec::FixedWidth { alignment, .. } = spec {
if buffer.as_ptr().align_offset(*alignment) != 0 {
*buffer = Buffer::from_slice_ref(buffer.as_ref());
}
}
}
// align children data recursively
for data in self.child_data.iter_mut() {
data.align_buffers()
}
}
```
it seems that the math for checking if the buffer is aligned just comes down
to `ptr_address % alignment (64)` if this isnt 0 it copys the buffer into a new
buffer that is aligned correctly.
[MutableBuffer](https://docs.rs/arrow/latest/arrow/buffer/struct.MutableBuffer.html)
"Buffers created from MutableBuffer (via into) are guaranteed to be aligned
along cache lines and in multiples of 64 bytes."
From my understanding arrow-ipc adds padding within the IPC body to account
for alignment.
<img width="2098" height="1176" alt="Image 6-23-26 at 10 43 PM"
src="https://github.com/user-attachments/assets/5f520fb7-00f6-4780-adb1-17cabccbbb73"
/>
The issue occurs when the initial buffer returned by Tonic starts at a
misaligned byte offset. For example, if the buffer starts at byte offset 253,
it isn't aligned to a 32 or 64-byte boundary. Because Arrow IPC's internal
padding is relative to the start of the buffer, a misaligned starting offset
means all the sub-buffers inside it are also misaligned, the padding does
nothing to fix this. The result is that every sub-buffer has to be copied into
a fresh, correctly aligned allocation anyway.
Since alignment is determined by where the overall buffer starts, the fix
should happen upfront: check whether the buffer Tonic returns is aligned before
beginning to decode it, and if not, reallocate it once at that point rather
than discovering the problem per sub-buffer during decoding.
This would avoid N buffer copies in exchange for 1 large buffer copy at the
start.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]