hi all,

As we've been discussing for the last 5 weeks or so [1], there is a
need to introduce 4 bytes of padding into the preamble of the
"encapsulated IPC message" format to ensure that the Flatbuffers
metadata payload begins on an 8-byte aligned memory offset. The
alternative to this would be for Arrow implementations where alignment
is important (e.g. C or C++) to copy the metadata (which is not always
small) into memory when it is unaligned.

Micah has proposed to address this by adding a 4-byte "continuation"
value at the beginning of the payload having the value 0xFFFFFFFF. The
reason to do it this way is that old clients will see an invalid
length (what is currently the first 4 bytes of the message -- a 32-bit
little endian signed integer indicating the metadata length) rather
than potentially crashing on a valid length.

This would be a backwards incompatible protocol change, so older Arrow
libraries would not be able to read these new messages. Maintaining
forward compatibility (reading data produced by older libraries) would
be possible as we can reason that a value other than the continuation
value was produced by an older library (and then validate the
Flatbuffer message of course). Arrow implementations could offer a
backward compatibility mode for the sake of old readers if they desire
(this may also assist with testing).

The PR making these changes to the IPC documentation is here

https://github.com/apache/arrow/pull/4951

Please vote to accept this change. This vote will be open for at least 72 hours

[ ] +1 Adopt the Arrow protocol change
[ ] +0
[ ] -1 I disagree because...

Here is my vote: +1

Thanks,
Wes

[1]: 
https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E

Reply via email to