I just updated my pull request from May adding language to clarify what protocol writers are expected to set when producing the Arrow binary protocol
https://github.com/apache/arrow/pull/4370 Implementations may allocate small buffers, or use memory which does not meet the 8-byte minimal padding requirements of the Arrow protocol. It becomes a question, then, whether to set the in-memory buffer size or the padded size when producing the protocol. This PR states that either is acceptable. As an example, a 1-byte validity buffer could have Buffer metadata stating that the size either is 1 byte or 8 bytes. Either way, 7 bytes of padding must be written to conform to the protocol. The metadata, therefore, reflects the "intent" of the protocol writer for the protocol reader. If the writer says the length is 1, then the protocol reader understands that the writer does not expect the reader to concern itself with the 7 bytes of padding. This could have implications for hashing or comparisons, for example, so I think that having the flexibility to do either is a good idea. For an application that wants to guarantee that AVX512 instructions can be used on all buffers on the receiver side, it would be appropriate to include 512-bit padding in the accounting. Let me know if others think differently so we can have this properly documented for the 1.0.0 Format release. Thanks, Wes