andishgar commented on code in PR #46229: URL: https://github.com/apache/arrow/pull/46229#discussion_r2207923920
########## cpp/src/arrow/array/array_binary.cc: ########## @@ -105,6 +111,392 @@ BinaryViewArray::BinaryViewArray(std::shared_ptr<DataType> type, int64_t length, ArrayData::Make(std::move(type), length, std::move(buffers), null_count, offset)); } +namespace { + +// TODO Should We move this to bitmap_ops.h and Remove from compute/kernels/util.s +Result<std::shared_ptr<Buffer>> GetOrCopyNullBitmapBuffer(const ArrayData& in_array, + MemoryPool* pool) { + if (in_array.buffers[0]->data() == nullptr) { + return nullptr; + } else if (in_array.offset == 0) { + return in_array.buffers[0]; + } else if (in_array.offset % 8 == 0) { + return SliceBuffer(in_array.buffers[0], /*offset=*/in_array.offset / 8); + } else { + // If a non-zero offset, we need to shift the bitmap + return internal::CopyBitmap(pool, in_array.buffers[0]->data(), in_array.offset, + in_array.length); + } +} + +struct Interval { + int64_t start; + int64_t end; + int32_t offset = -1; Review Comment: Maybe I didn’t choose the best name—let me explain what "offset" means here. Imagine the following intervals: ```css [0,15] [10,30] [80,100] [90 120] [1000 100] [1090,200] ``` After adding and merging them, they become: ```css [0,30] [80,120] [1000 1200] ``` The offsets for these intervals would then be: ```css 0,30,70 ``` In other words, the offset represents the position of each merged interval as if all intervals were compacted together into a new buffer. Do you think I should rename "offset" to something clearer, or is it understandable as is? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org