[GitHub] [arrow] wjones127 commented on a diff in pull request #14008: ARROW-13454: [C++][Docs] Tables vs Record Batches

GitBox Wed, 31 Aug 2022 10:46:37 -0700


wjones127 commented on code in PR #14008:
URL: https://github.com/apache/arrow/pull/14008#discussion_r959860181



##########
docs/source/cpp/tables.rst:
##########
@@ -77,6 +77,17 @@ has a schema which must match its arrays' datatypes.
 Record batches are a convenient unit of work for various serialization
 and computation functions, possibly incremental.
 
+.. image:: tables-versus-record-batches.svg
+   :alt: A graphical representation of an Arrow Table and a Record Batch, with
+         structure as described in text above.
+
+Because record batches can be represented as a struct array, they can be 
+exported through the C data interface between implementations. Tables and 
+chunked arrays, on the other hand, are concepts in the C++ implementation, not 
+in the Arrow format itself, so they aren't directly portable.
+
+However, a table can be converted to and built from a sequence of record 
+batches easily without needing to copy the underlying array buffers.

Review Comment:
   > Does each RecordBatch just store pointers to start and end, and the 
breaking each boundary ensures contiguity?
   
   Arrays generally have a `shared_ptr<Buffer>`, an offset, and a length. You 
can `Slice()` any array, and it just copies the shared pointer to the buffer, 
and adjusts the offset and length accordingly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wjones127 commented on a diff in pull request #14008: ARROW-13454: [C++][Docs] Tables vs Record Batches

Reply via email to