AlenkaF commented on code in PR #43950:
URL: https://github.com/apache/arrow/pull/43950#discussion_r1756234760
##########
docs/source/format/Columnar.rst:
##########
@@ -1385,6 +1387,65 @@ have two entries in each RecordBatch. For a RecordBatch
of this schema with
buffer 13: col2 data
+Compression
+-----------
+
+There are three different options for compression of record batch
+body buffers: Buffers can be uncompressed, buffers can be
+compressed with the ``lz4`` compression codec, or buffers can be
+compressed with the ``zstd`` compression codec. Buffers in the
+flat sequence of a message body must be compressed separately using
+the same codec. Specific buffer in the sequence of compressed
+buffers can be left uncompressed in case compression does not yield
+appreciable savings.
+
+The codec or the compression type used is defined in the ``data header```
+of the :ref:`ipc-recordbatch-message` in the optional ``compression``
+field.
+
+.. note::
+
+ ``lz4`` compression codec means the
+ `LZ4 frame format
<https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md>`_
+ and should not to be confused with
+ `"raw" (also called "block") format
<https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md>`_.
+
+The difference between compressed and uncompressed buffers in the
+serialized form is as follows:
+
+* If the buffers in the :ref:`ipc-recordbatch-message` are **compressed**
+
+ - the ``data header`` includes the length and memory offset
+ of each **compressed buffer** in the record batch's body together
+ with the compression type
+
+ - the ``body`` includes a flat sequence of **compressed buffers**
+ together with the **length of the uncompressed buffer** as a 64-bit
+ little-endian signed integer stored in the first 8 bytes for each
+ buffer in the sequence. The first 8 bytes can be left empty or equal
Review Comment:
Looking at the Message.fbs I can't find any record of such info either -
somehow it turned up in my notes, sorry for that. And thank you for your sharp
review! Will correct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]