wjones127 commented on code in PR #14008: URL: https://github.com/apache/arrow/pull/14008#discussion_r959860181
########## docs/source/cpp/tables.rst: ########## @@ -77,6 +77,17 @@ has a schema which must match its arrays' datatypes. Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental. +.. image:: tables-versus-record-batches.svg + :alt: A graphical representation of an Arrow Table and a Record Batch, with + structure as described in text above. + +Because record batches can be represented as a struct array, they can be +exported through the C data interface between implementations. Tables and +chunked arrays, on the other hand, are concepts in the C++ implementation, not +in the Arrow format itself, so they aren't directly portable. + +However, a table can be converted to and built from a sequence of record +batches easily without needing to copy the underlying array buffers. Review Comment: > Does each RecordBatch just store pointers to start and end, and the breaking each boundary ensures contiguity? Arrays generally have a `shared_ptr<Buffer>`, an offset, and a length. You can `Slice()` any array, and it just copies the shared pointer to the buffer, and adjusts the offset and length accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org