Re: [Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?
The Buffer struct / metadata need not be a multiple of 8 bytes necessarily but you must write padding bytes when emitting the IPC protocol. So if your validity bitmap is 2 bytes in-memory then you must write at least 6 more bytes of padding on the wire. On Fri, Apr 26, 2019, 3:48 PM Micah Kornfield wrote: > Hi Neville, > Here is my understanding. Per the spec [1], 8 bytes of padding is > allowed/required but 64 bytes is recommended (Is "bits" in your e-mail is a > typo?). The main rationale is to allow SIMD instructions. > > For actual record batches only padding to a multiple of 8-bytes are > required [2]. > > Note that slicing of buffers still might require mem-copies/padding. > > Thanks, > Micah > > [1] https://arrow.apache.org/docs/format/Layout.html#alignment-and-padding > [2] https://arrow.apache.org/docs/format/IPC.html > > On Fri, Apr 26, 2019 at 1:29 PM Neville Dipale > wrote: > > > Hi Arrow developers, > > > > I'm currently working on IPC in Rust, specifically reading Arrow files. > > I've noticed that null buffers/bitmaps are always padded to 64 bits (from > > pyarrow, not sure about others), while in Rust we pad to 8 bits. > > > > 1. Is this fine re. Rust per the spec? > > > > I'm having issues with reading, but only because I'm comparing array data > > and not only the values and nullness of slots. I see this being more of a > > problem when writing to files and streams as we'd need to pad null > buffers > > almost every time (since for large arrays IPC could need 2048 while we > have > > 2046, so it's not a small data issue) > > > > 2. If implementations are allowed to choose either 8 or 64, are the Rust > > commiters happy with us changing to 64-bit padding? > > > > The benefits of changing to 64 would be removing the need to then pad the > > buffer when writing to streams and files, and it'll make us more > compatible > > with other implementations. I suspect this would still come as an issue > > when we get to add Rust to interop tests. > > > > I tried changing to 64-bit before writing this mail, but bit-fu is still > > beyond my knowledge, so I'd need help from someone else with implementing > > this, or at least letting me know which lines to change. I don't mind > then > > making sure all tests still pass. > > > > My goal is to complete IPC work by 0.14 release, so this would be a bit > > urgent as I'm stuck right now. > > > > Thanks > > Neville > > >
Re: [Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?
Hi Neville, Here is my understanding. Per the spec [1], 8 bytes of padding is allowed/required but 64 bytes is recommended (Is "bits" in your e-mail is a typo?). The main rationale is to allow SIMD instructions. For actual record batches only padding to a multiple of 8-bytes are required [2]. Note that slicing of buffers still might require mem-copies/padding. Thanks, Micah [1] https://arrow.apache.org/docs/format/Layout.html#alignment-and-padding [2] https://arrow.apache.org/docs/format/IPC.html On Fri, Apr 26, 2019 at 1:29 PM Neville Dipale wrote: > Hi Arrow developers, > > I'm currently working on IPC in Rust, specifically reading Arrow files. > I've noticed that null buffers/bitmaps are always padded to 64 bits (from > pyarrow, not sure about others), while in Rust we pad to 8 bits. > > 1. Is this fine re. Rust per the spec? > > I'm having issues with reading, but only because I'm comparing array data > and not only the values and nullness of slots. I see this being more of a > problem when writing to files and streams as we'd need to pad null buffers > almost every time (since for large arrays IPC could need 2048 while we have > 2046, so it's not a small data issue) > > 2. If implementations are allowed to choose either 8 or 64, are the Rust > commiters happy with us changing to 64-bit padding? > > The benefits of changing to 64 would be removing the need to then pad the > buffer when writing to streams and files, and it'll make us more compatible > with other implementations. I suspect this would still come as an issue > when we get to add Rust to interop tests. > > I tried changing to 64-bit before writing this mail, but bit-fu is still > beyond my knowledge, so I'd need help from someone else with implementing > this, or at least letting me know which lines to change. I don't mind then > making sure all tests still pass. > > My goal is to complete IPC work by 0.14 release, so this would be a bit > urgent as I'm stuck right now. > > Thanks > Neville >
[Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?
Hi Arrow developers, I'm currently working on IPC in Rust, specifically reading Arrow files. I've noticed that null buffers/bitmaps are always padded to 64 bits (from pyarrow, not sure about others), while in Rust we pad to 8 bits. 1. Is this fine re. Rust per the spec? I'm having issues with reading, but only because I'm comparing array data and not only the values and nullness of slots. I see this being more of a problem when writing to files and streams as we'd need to pad null buffers almost every time (since for large arrays IPC could need 2048 while we have 2046, so it's not a small data issue) 2. If implementations are allowed to choose either 8 or 64, are the Rust commiters happy with us changing to 64-bit padding? The benefits of changing to 64 would be removing the need to then pad the buffer when writing to streams and files, and it'll make us more compatible with other implementations. I suspect this would still come as an issue when we get to add Rust to interop tests. I tried changing to 64-bit before writing this mail, but bit-fu is still beyond my knowledge, so I'd need help from someone else with implementing this, or at least letting me know which lines to change. I don't mind then making sure all tests still pass. My goal is to complete IPC work by 0.14 release, so this would be a bit urgent as I'm stuck right now. Thanks Neville