Hi Neville,
Here is my understanding.  Per the spec [1], 8 bytes of padding is
allowed/required but 64 bytes is recommended (Is "bits" in your e-mail is a
typo?).  The main rationale is to allow SIMD instructions.

For actual record batches only padding to a multiple of 8-bytes are
required [2].

Note that slicing of buffers still might require mem-copies/padding.

Thanks,
Micah

[1] https://arrow.apache.org/docs/format/Layout.html#alignment-and-padding
[2] https://arrow.apache.org/docs/format/IPC.html

On Fri, Apr 26, 2019 at 1:29 PM Neville Dipale <nevilled...@gmail.com>
wrote:

> Hi Arrow developers,
>
> I'm currently working on IPC in Rust, specifically reading Arrow files.
> I've noticed that null buffers/bitmaps are always padded to 64 bits (from
> pyarrow, not sure about others), while in Rust we pad to 8 bits.
>
> 1. Is this fine re. Rust per the spec?
>
> I'm having issues with reading, but only because I'm comparing array data
> and not only the values and nullness of slots. I see this being more of a
> problem when writing to files and streams as we'd need to pad null buffers
> almost every time (since for large arrays IPC could need 2048 while we have
> 2046, so it's not a small data issue)
>
> 2. If implementations are allowed to choose either 8 or 64, are the Rust
> commiters happy with us changing to 64-bit padding?
>
> The benefits of changing to 64 would be removing the need to then pad the
> buffer when writing to streams and files, and it'll make us more compatible
> with other implementations. I suspect this would still come as an issue
> when we get to add Rust to interop tests.
>
> I tried changing to 64-bit before writing this mail, but bit-fu is still
> beyond my knowledge, so I'd need help from someone else with implementing
> this, or at least letting me know which lines to change. I don't mind then
> making sure all tests still pass.
>
> My goal is to complete IPC work by 0.14 release, so this would be a bit
> urgent as I'm stuck right now.
>
> Thanks
> Neville
>

Reply via email to