Light-City commented on code in PR #37896:
URL: https://github.com/apache/arrow/pull/37896#discussion_r1339407325
##########
cpp/src/arrow/record_batch.cc:
##########
@@ -432,4 +433,43 @@ RecordBatchReader::~RecordBatchReader() {
ARROW_WARN_NOT_OK(this->Close(), "Implicitly called RecordBatchReader::Close
failed");
}
+Result<std::shared_ptr<RecordBatch>> ConcatenateRecordBatches(
+ const RecordBatchVector& batches, MemoryPool* pool) {
+ int64_t length = 0;
+ size_t n = batches.size();
+ if (n == 0) {
+ return Status::Invalid("Must pass at least one recordbatch");
+ }
+ if (n == 1) {
+ return batches[0];
+ }
+ int cols = batches[0]->num_columns();
+ auto schema = batches[0]->schema();
+ std::vector<std::shared_ptr<Array>> columns;
+ if (cols == 0) {
+ // special case: null batch, no data, just length
+ for (size_t i = 0; i < batches.size(); ++i) {
+ length += batches[i]->num_rows();
+ }
+ } else {
Review Comment:
The reason why the special case of column 0 is processed here is that if we
want to implement count(*), then this RecordBatch does not store data, that is,
only the length. This scenario is triggered when small batches need to be
merged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]