Re: [PR] GH-49946: [Format] Better document equivalence between IPC file and streams [arrow]

via GitHub Wed, 20 May 2026 14:51:24 -0700


alamb commented on code in PR #49947:
URL: https://github.com/apache/arrow/pull/49947#discussion_r3277378917



##########
docs/source/format/Columnar.rst:
##########
@@ -1333,22 +1334,21 @@ The flattened version of this is: ::
 For the buffers produced, we would have the following (refer to the
 table above): ::
 
-    buffer 0: field 0 validity
-    buffer 1: field 1 validity
-    buffer 2: field 1 values
-    buffer 3: field 2 validity
-    buffer 4: field 2 offsets
-    buffer 5: field 3 validity
-    buffer 6: field 3 values
-    buffer 7: field 4 validity
-    buffer 8: field 4 values
-    buffer 9: field 5 validity
-    buffer 10: field 5 offsets
-    buffer 11: field 5 data
-
-The ``Buffer`` Flatbuffers value describes the location and size of a
-piece of memory. Generally these are interpreted relative to the
-**encapsulated message format** defined below.
+    buffer 0: field 0 ('col1') validity

Review Comment:
   this is much nicer



##########
docs/source/format/Columnar.rst:
##########
@@ -1502,9 +1502,14 @@ message flatbuffer is read, you can then read the 
message body.
 
 The stream writer can signal end-of-stream (EOS) either by writing 8 bytes
 containing the 4-byte continuation indicator (``0xFFFFFFFF``) followed by 0
-metadata length (``0x00000000``) or closing the stream interface. We
-recommend the ".arrows" file extension for the streaming format although
-in many cases these streams will not ever be stored as files.
+metadata length (``0x00000000``) or closing the stream interface.
+
+File extension and MIME type
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+IPC Streams are not always stored as files, but when they are, we recommend

Review Comment:
   nit; I don't think they are commonly stored as files (the file format is 
more often)
   
   ```suggestion
   IPC Streams are not typically stored as files, but when they are, we 
recommend
   ```



##########
docs/source/format/Columnar.rst:
##########
@@ -1333,22 +1334,21 @@ The flattened version of this is: ::
 For the buffers produced, we would have the following (refer to the
 table above): ::
 
-    buffer 0: field 0 validity
-    buffer 1: field 1 validity
-    buffer 2: field 1 values
-    buffer 3: field 2 validity
-    buffer 4: field 2 offsets
-    buffer 5: field 3 validity
-    buffer 6: field 3 values
-    buffer 7: field 4 validity
-    buffer 8: field 4 values
-    buffer 9: field 5 validity
-    buffer 10: field 5 offsets
-    buffer 11: field 5 data
-
-The ``Buffer`` Flatbuffers value describes the location and size of a
-piece of memory. Generally these are interpreted relative to the
-**encapsulated message format** defined below.
+    buffer 0: field 0 ('col1') validity
+    buffer 1: field 1 ('col1.a') validity
+    buffer 2: field 1 ('col1.a') values
+    buffer 3: field 2 ('col1.b') validity
+    buffer 4: field 2 ('col1.b') offsets
+    buffer 5: field 3 ('col1.b.item') validity
+    buffer 6: field 3 ('col1.b.item') values
+    buffer 7: field 4 ('col1.c') validity
+    buffer 8: field 4 ('col1.c') values
+    buffer 9: field 5 ('col2') validity
+    buffer 10: field 5 ('col2') offsets
+    buffer 11: field 5 ('col2') data
+
+The ``Buffer`` Flatbuffers value describes the location and size of a buffer's
+data, relatively to the start of the RecordBatch message's body.

Review Comment:
   ```suggestion
   The ``Buffer`` Flatbuffers value describes the location and size of a 
buffer's
   data, relative to the start of the RecordBatch message's body.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-49946: [Format] Better document equivalence between IPC file and streams [arrow]

Reply via email to