Hi Jorge,

Le 13/02/2021 à 04:56, Jorge Cardoso Leitão a écrit :
> 
> One solution is to assume an offset of zero when reading from IPC. But afai
> understand, in that case, producers must themselves only share bitmap
> buffers that are aligned at "8 bit boundaries". For example, an array with
> offset 3, len 12 and a (shared) validity buffer with
> 
> 01101010, 01101010
> 
> can't just write the above to the message; it must write the "new" below:
> 
> new: (010){01101}, 0000[1101]
> old: {01101}010, 0[1101](010)  # 12 + 3 = 15, unbracket bits are ignored
> 
> i.e. skip the first 3 bits from the first byte and shift all bits
> accordingly.
> 
> Is this reasoning correct? Is this the intention?

This is right.  You'll see here the implementation in the C++ IPC
writer, where non-byte aligned bitmaps are being copied to a temporary
buffer:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L84-L99

(note this code is a bit suboptimal, it could avoid copying if the
offset is a multiple of 8)

This must be done for the data of boolean arrays as well:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L301-L307

Regards

Antoine.

Reply via email to