Re: [PR] GH-47918: [Format] Clarify that empty compressed buffers can omit the length header [arrow]

via GitHub Tue, 16 Dec 2025 19:04:17 -0800


paleolimbot commented on code in PR #48541:
URL: https://github.com/apache/arrow/pull/48541#discussion_r2625425830



##########
format/Message.fbs:
##########
@@ -55,14 +55,15 @@ enum CompressionType:byte {
 /// Provided for forward compatibility in case we need to support different
 /// strategies for compressing the IPC message body (like whole-body
 /// compression rather than buffer-level) in the future
-enum BodyCompressionMethod:byte {
+enum BodyCompressionMethod: byte {
   /// Each constituent buffer is first compressed with the indicated
   /// compressor, and then written with the uncompressed length in the first 8
   /// bytes as a 64-bit little-endian signed integer followed by the compressed
   /// buffer bytes (and then padding as required by the protocol). The
   /// uncompressed length may be set to -1 to indicate that the data that
   /// follows is not compressed, which can be useful for cases where
   /// compression does not yield appreciable savings.
+  /// Also, empty buffers can optionally omit the 8-byte length header.

Review Comment:
   I'm also interested in such a file. It might be nice in this comment to take 
a stance on whether an implementation *should* do this for new implementors 
that don't have an opinion. It seems like the rationale is that you save 8 
bytes per column in most cases (since this would probably get used for every 
null buffer in compressed output)...does anybody need that level of 
optimization?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-47918: [Format] Clarify that empty compressed buffers can omit the length header [arrow]

Reply via email to