andishgar commented on code in PR #46229: URL: https://github.com/apache/arrow/pull/46229#discussion_r2208052873
########## cpp/src/arrow/array/array_binary.cc: ########## @@ -105,6 +111,392 @@ BinaryViewArray::BinaryViewArray(std::shared_ptr<DataType> type, int64_t length, ArrayData::Make(std::move(type), length, std::move(buffers), null_count, offset)); } +namespace { + +// TODO Should We move this to bitmap_ops.h and Remove from compute/kernels/util.s +Result<std::shared_ptr<Buffer>> GetOrCopyNullBitmapBuffer(const ArrayData& in_array, + MemoryPool* pool) { + if (in_array.buffers[0]->data() == nullptr) { + return nullptr; + } else if (in_array.offset == 0) { + return in_array.buffers[0]; + } else if (in_array.offset % 8 == 0) { + return SliceBuffer(in_array.buffers[0], /*offset=*/in_array.offset / 8); + } else { + // If a non-zero offset, we need to shift the bitmap + return internal::CopyBitmap(pool, in_array.buffers[0]->data(), in_array.offset, + in_array.length); + } +} + +struct Interval { Review Comment: Regarding writing a solution for this, I came up with something a bit unusual. Every idea that came to mind, I tested using a kind of "glass-box" approach, writing tests with 100% path coverage — and they all passed. But eventually, I discovered a single interval that broke everything. So in the end, my solution was to first generate two CSV files: one containing the original intervals and the other containing the merged intervals, calculated using a reliable library. Then, I added these intervals to my code and compared my results with the expected merged intervals. (I tested this on 100 million intervals, with a total size of about 1.6 GB.) Would it make sense to import smaller CSV files and write tests based on those? Or should I just stick to writing unit tests to cover different paths? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org