andishgar commented on code in PR #46229: URL: https://github.com/apache/arrow/pull/46229#discussion_r2177301787
########## cpp/src/arrow/array/array_binary.cc: ########## @@ -104,7 +105,14 @@ BinaryViewArray::BinaryViewArray(std::shared_ptr<DataType> type, int64_t length, SetData( ArrayData::Make(std::move(type), length, std::move(buffers), null_count, offset)); } - +Result<std::shared_ptr<Array>> BinaryViewArray::CompactArray(MemoryPool* pool) const { + std::unique_ptr<ArrayBuilder> builder; + ARROW_RETURN_NOT_OK(MakeBuilder(pool, type(), &builder)); + ArraySpan span(*data()); + // ArraySpan handles offset + ARROW_RETURN_NOT_OK(builder->AppendArraySlice(span, 0, span.length)); Review Comment: > We could certainly expose a parameter to choose the ratio of unused buffer space above which to compact a buffer. > > > Additionally, there's another concern: it might be useful to expose a public API method to indicate whether calling `CompactArray` would be beneficial. > > Well, the decision is buffer-grained, not array-grained: some buffers might be worth compacting and some others not. > > > What do you think? Should I integrate this heuristic into `CompactArray`, or would it be better to split it into a separate method or even into a different pull request? > > As said above, I think it would make sense to integrate the heuristic and apply it at the buffer level. Pardon me, just to be sure — does your overall suggestion correspond to pseudocode like this? ```c++ Result<std::shared_ptr<Array>> BinaryViewArray::CompatArray( MemoryPool* pool, double occupancy_threshold) const { .. .. .. for (int32_t i = 0; i < number_of_data_buffer; ++i) { if (BufferOccupancy(i) <= occupancy_threshold) { // Do something } } return result; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org