andishgar commented on code in PR #46229:
URL: https://github.com/apache/arrow/pull/46229#discussion_r2208052873


##########
cpp/src/arrow/array/array_binary.cc:
##########
@@ -105,6 +111,392 @@ 
BinaryViewArray::BinaryViewArray(std::shared_ptr<DataType> type, int64_t length,
       ArrayData::Make(std::move(type), length, std::move(buffers), null_count, 
offset));
 }
 
+namespace {
+
+// TODO Should We move this to bitmap_ops.h and Remove from 
compute/kernels/util.s
+Result<std::shared_ptr<Buffer>> GetOrCopyNullBitmapBuffer(const ArrayData& 
in_array,
+                                                          MemoryPool* pool) {
+  if (in_array.buffers[0]->data() == nullptr) {
+    return nullptr;
+  } else if (in_array.offset == 0) {
+    return in_array.buffers[0];
+  } else if (in_array.offset % 8 == 0) {
+    return SliceBuffer(in_array.buffers[0], /*offset=*/in_array.offset / 8);
+  } else {
+    // If a non-zero offset, we need to shift the bitmap
+    return internal::CopyBitmap(pool, in_array.buffers[0]->data(), 
in_array.offset,
+                                in_array.length);
+  }
+}
+
+struct Interval {

Review Comment:
   Regarding writing a solution for this, I came up with something a bit 
unusual.
   
   Every idea that came to mind, I tested using a kind of "glass-box" approach, 
writing tests with 100% path coverage — and they all passed. But eventually, I 
discovered a single interval that broke everything.
   
   So in the end, my solution was to first generate two CSV files: one 
containing the original intervals and the other containing the merged 
intervals, calculated using a reliable library. Then, I added these intervals 
to my code and compared my results with the expected merged intervals. (I 
tested this on 100 million intervals, with a total size of about 1.6 GB.)
   
   Would it make sense to import smaller CSV files and write tests based on 
those? Or should I just stick to writing unit tests to cover different paths?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to