Re: [Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?

2019-04-26 Thread Wes McKinney
The Buffer struct / metadata need not be a multiple of 8 bytes necessarily
but you must write padding bytes when emitting the IPC protocol. So if your
validity bitmap is 2 bytes in-memory then you must write at least 6 more
bytes of padding on the wire.

On Fri, Apr 26, 2019, 3:48 PM Micah Kornfield  wrote:

> Hi Neville,
> Here is my understanding.  Per the spec [1], 8 bytes of padding is
> allowed/required but 64 bytes is recommended (Is "bits" in your e-mail is a
> typo?).  The main rationale is to allow SIMD instructions.
>
> For actual record batches only padding to a multiple of 8-bytes are
> required [2].
>
> Note that slicing of buffers still might require mem-copies/padding.
>
> Thanks,
> Micah
>
> [1] https://arrow.apache.org/docs/format/Layout.html#alignment-and-padding
> [2] https://arrow.apache.org/docs/format/IPC.html
>
> On Fri, Apr 26, 2019 at 1:29 PM Neville Dipale 
> wrote:
>
> > Hi Arrow developers,
> >
> > I'm currently working on IPC in Rust, specifically reading Arrow files.
> > I've noticed that null buffers/bitmaps are always padded to 64 bits (from
> > pyarrow, not sure about others), while in Rust we pad to 8 bits.
> >
> > 1. Is this fine re. Rust per the spec?
> >
> > I'm having issues with reading, but only because I'm comparing array data
> > and not only the values and nullness of slots. I see this being more of a
> > problem when writing to files and streams as we'd need to pad null
> buffers
> > almost every time (since for large arrays IPC could need 2048 while we
> have
> > 2046, so it's not a small data issue)
> >
> > 2. If implementations are allowed to choose either 8 or 64, are the Rust
> > commiters happy with us changing to 64-bit padding?
> >
> > The benefits of changing to 64 would be removing the need to then pad the
> > buffer when writing to streams and files, and it'll make us more
> compatible
> > with other implementations. I suspect this would still come as an issue
> > when we get to add Rust to interop tests.
> >
> > I tried changing to 64-bit before writing this mail, but bit-fu is still
> > beyond my knowledge, so I'd need help from someone else with implementing
> > this, or at least letting me know which lines to change. I don't mind
> then
> > making sure all tests still pass.
> >
> > My goal is to complete IPC work by 0.14 release, so this would be a bit
> > urgent as I'm stuck right now.
> >
> > Thanks
> > Neville
> >
>


Re: [Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?

2019-04-26 Thread Micah Kornfield
Hi Neville,
Here is my understanding.  Per the spec [1], 8 bytes of padding is
allowed/required but 64 bytes is recommended (Is "bits" in your e-mail is a
typo?).  The main rationale is to allow SIMD instructions.

For actual record batches only padding to a multiple of 8-bytes are
required [2].

Note that slicing of buffers still might require mem-copies/padding.

Thanks,
Micah

[1] https://arrow.apache.org/docs/format/Layout.html#alignment-and-padding
[2] https://arrow.apache.org/docs/format/IPC.html

On Fri, Apr 26, 2019 at 1:29 PM Neville Dipale 
wrote:

> Hi Arrow developers,
>
> I'm currently working on IPC in Rust, specifically reading Arrow files.
> I've noticed that null buffers/bitmaps are always padded to 64 bits (from
> pyarrow, not sure about others), while in Rust we pad to 8 bits.
>
> 1. Is this fine re. Rust per the spec?
>
> I'm having issues with reading, but only because I'm comparing array data
> and not only the values and nullness of slots. I see this being more of a
> problem when writing to files and streams as we'd need to pad null buffers
> almost every time (since for large arrays IPC could need 2048 while we have
> 2046, so it's not a small data issue)
>
> 2. If implementations are allowed to choose either 8 or 64, are the Rust
> commiters happy with us changing to 64-bit padding?
>
> The benefits of changing to 64 would be removing the need to then pad the
> buffer when writing to streams and files, and it'll make us more compatible
> with other implementations. I suspect this would still come as an issue
> when we get to add Rust to interop tests.
>
> I tried changing to 64-bit before writing this mail, but bit-fu is still
> beyond my knowledge, so I'd need help from someone else with implementing
> this, or at least letting me know which lines to change. I don't mind then
> making sure all tests still pass.
>
> My goal is to complete IPC work by 0.14 release, so this would be a bit
> urgent as I'm stuck right now.
>
> Thanks
> Neville
>


[Rust] [Format] Should Null Bitmaps be Padded to 8 or 64 Bits?

2019-04-26 Thread Neville Dipale
Hi Arrow developers,

I'm currently working on IPC in Rust, specifically reading Arrow files.
I've noticed that null buffers/bitmaps are always padded to 64 bits (from
pyarrow, not sure about others), while in Rust we pad to 8 bits.

1. Is this fine re. Rust per the spec?

I'm having issues with reading, but only because I'm comparing array data
and not only the values and nullness of slots. I see this being more of a
problem when writing to files and streams as we'd need to pad null buffers
almost every time (since for large arrays IPC could need 2048 while we have
2046, so it's not a small data issue)

2. If implementations are allowed to choose either 8 or 64, are the Rust
commiters happy with us changing to 64-bit padding?

The benefits of changing to 64 would be removing the need to then pad the
buffer when writing to streams and files, and it'll make us more compatible
with other implementations. I suspect this would still come as an issue
when we get to add Rust to interop tests.

I tried changing to 64-bit before writing this mail, but bit-fu is still
beyond my knowledge, so I'd need help from someone else with implementing
this, or at least letting me know which lines to change. I don't mind then
making sure all tests still pass.

My goal is to complete IPC work by 0.14 release, so this would be a bit
urgent as I'm stuck right now.

Thanks
Neville