andishgar commented on code in PR #46229:
URL: https://github.com/apache/arrow/pull/46229#discussion_r2177301787


##########
cpp/src/arrow/array/array_binary.cc:
##########
@@ -104,7 +105,14 @@ BinaryViewArray::BinaryViewArray(std::shared_ptr<DataType> 
type, int64_t length,
   SetData(
       ArrayData::Make(std::move(type), length, std::move(buffers), null_count, 
offset));
 }
-
+Result<std::shared_ptr<Array>> BinaryViewArray::CompactArray(MemoryPool* pool) 
const {
+  std::unique_ptr<ArrayBuilder> builder;
+  ARROW_RETURN_NOT_OK(MakeBuilder(pool, type(), &builder));
+  ArraySpan span(*data());
+  // ArraySpan handles offset
+  ARROW_RETURN_NOT_OK(builder->AppendArraySlice(span, 0, span.length));

Review Comment:
   
   > We could certainly expose a parameter to choose the ratio of unused buffer 
space above which to compact a buffer.
   > 
   > > Additionally, there's another concern: it might be useful to expose a 
public API method to indicate whether calling `CompactArray` would be 
beneficial.
   > 
   > Well, the decision is buffer-grained, not array-grained: some buffers 
might be worth compacting and some others not.
   > 
   > > What do you think? Should I integrate this heuristic into 
`CompactArray`, or would it be better to split it into a separate method or 
even into a different pull request?
   > 
   > As said above, I think it would make sense to integrate the heuristic and 
apply it at the buffer level.
   
   Pardon me, just to be sure — does your overall suggestion correspond to 
pseudocode like this?
   
   ```c++
   Result<std::shared_ptr<Array>> BinaryViewArray::CompatArray(
       MemoryPool* pool, double occupancy_threshold) const {
     ..
     ..
     ..
     for (int32_t i = 0; i < number_of_data_buffer; ++i) {
       if (BufferOccupancy(i) <= occupancy_threshold) {
         // Do something
       }
     }
     return result;
   }
   
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to