alamb opened a new issue, #7899:
URL: https://github.com/apache/arrow-rs/issues/7899

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   This came up in conversations with @friendlymatthew and @zeroshade  today
   
   Given this example
   ```rust
   let mut builder = VariantBuilder::new()
   // the sub builder allocates a new buffer
   let mut obj = builder.new_object();
   obj.insert("a", 1);
   // finishes the builder, copies the data into the parent's buider
    obj.finish()?;
   ```
   
   Here is the buffer used by the ObjectBuilder:
   
https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L817-L816
   
   Here is where it is copied to the parent builder: 
https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L936-L935
   
   
   **Describe the solution you'd like**
   What I would like to do is avoid the extra allocation to improve performance
   
   **Describe alternatives you've considered**
   Here is an approach that must copy the child object bytes but does not use 
its own allocation. It is modeled after a description of how the go 
implementation works from @zeroshade 
   
   1. Change the ObjectBuilder so it remembers where the object should start in 
the parent's buffer
   2. Remove `ObjectBuffer::buffer` field
   3. On append, the ObjectBuilder writes directly into the parent's buffer 
   4. On `ObjectBuilder::finish` compute how much space is needed for the 
offsets, and shift (by copy) the child object bytes down by that amount in the 
parent's buffer
   5. Fill in the object header + offsets for the child array 
   6. return
   
   Ideally we would see some performance improvement in the benchmarks
   
   **Additional context**
   
   If this works out, I think we can do a similar optimization for ListBuilder


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to