Hi Neville, Here is my understanding. Per the spec [1], 8 bytes of padding is allowed/required but 64 bytes is recommended (Is "bits" in your e-mail is a typo?). The main rationale is to allow SIMD instructions.
For actual record batches only padding to a multiple of 8-bytes are required [2]. Note that slicing of buffers still might require mem-copies/padding. Thanks, Micah [1] https://arrow.apache.org/docs/format/Layout.html#alignment-and-padding [2] https://arrow.apache.org/docs/format/IPC.html On Fri, Apr 26, 2019 at 1:29 PM Neville Dipale <nevilled...@gmail.com> wrote: > Hi Arrow developers, > > I'm currently working on IPC in Rust, specifically reading Arrow files. > I've noticed that null buffers/bitmaps are always padded to 64 bits (from > pyarrow, not sure about others), while in Rust we pad to 8 bits. > > 1. Is this fine re. Rust per the spec? > > I'm having issues with reading, but only because I'm comparing array data > and not only the values and nullness of slots. I see this being more of a > problem when writing to files and streams as we'd need to pad null buffers > almost every time (since for large arrays IPC could need 2048 while we have > 2046, so it's not a small data issue) > > 2. If implementations are allowed to choose either 8 or 64, are the Rust > commiters happy with us changing to 64-bit padding? > > The benefits of changing to 64 would be removing the need to then pad the > buffer when writing to streams and files, and it'll make us more compatible > with other implementations. I suspect this would still come as an issue > when we get to add Rust to interop tests. > > I tried changing to 64-bit before writing this mail, but bit-fu is still > beyond my knowledge, so I'd need help from someone else with implementing > this, or at least letting me know which lines to change. I don't mind then > making sure all tests still pass. > > My goal is to complete IPC work by 0.14 release, so this would be a bit > urgent as I'm stuck right now. > > Thanks > Neville >