Re: [PR] GH-37756: [Format][Docs] Document IPC Compression [arrow]

via GitHub Wed, 04 Sep 2024 06:26:12 -0700


mapleFU commented on code in PR #43950:
URL: https://github.com/apache/arrow/pull/43950#discussion_r1743794841



##########
docs/source/format/Columnar.rst:
##########
@@ -1385,6 +1385,36 @@ have two entries in each RecordBatch. For a RecordBatch 
of this schema with
     buffer 13: col2    data
 
 
+Compression
+-----------
+
+There are three different options for record batch body
+buffers compression: buffers can be uncompressed, can use
+``lz4`` or ``zstd`` compression codec. All buffers in the flat
+sequence of the message body are compressed separately with the
+same codec.
+
+The difference between compressed and uncompressed buffers in the
+serialized form is as follows:
+
+* If the buffers in the ``RecordBatch`` message are **compressed**
+
+  - the ``data header`` includes the length and memory offset
+    of each **compressed buffer** in the record batch's body
+
+  - the ``body`` includes a flat sequence of **compressed memory
+    buffers** together with the **length of uncompressed buffer**
+    stored in the first 8 bytes for each buffer in the sequence

Review Comment:
   should we denote endian and signed here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-37756: [Format][Docs] Document IPC Compression [arrow]

Reply via email to