Hi Jorge,
Le 13/02/2021 à 04:56, Jorge Cardoso Leitão a écrit : > > One solution is to assume an offset of zero when reading from IPC. But afai > understand, in that case, producers must themselves only share bitmap > buffers that are aligned at "8 bit boundaries". For example, an array with > offset 3, len 12 and a (shared) validity buffer with > > 01101010, 01101010 > > can't just write the above to the message; it must write the "new" below: > > new: (010){01101}, 0000[1101] > old: {01101}010, 0[1101](010) # 12 + 3 = 15, unbracket bits are ignored > > i.e. skip the first 3 bits from the first byte and shift all bits > accordingly. > > Is this reasoning correct? Is this the intention? This is right. You'll see here the implementation in the C++ IPC writer, where non-byte aligned bitmaps are being copied to a temporary buffer: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L84-L99 (note this code is a bit suboptimal, it could avoid copying if the offset is a multiple of 8) This must be done for the data of boolean arrays as well: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L301-L307 Regards Antoine.