liukun4515 commented on PR #2968:
URL: 
https://github.com/apache/arrow-datafusion/pull/2968#issuecomment-1197648845

   > Thanks @comphead!
   > 
   > I propose we fix the followings:
   > 
   > * The `new_width > to.data.len()` change makes sense to me, but on the 
resize part:
   > 
   > ```rust
   >     if new_width > to.data.len() {
   >         to.data.resize(max(to.data.capacity(), new_width), 0);
   >     }
   > ```
   > 
   > I think we could just resize to capacity to avoid reallocating, instead of 
the previous double capacity way.
   > 
   > * `write_field_binary` should be adapted accordingly as well.
   > * The capacity() bug also exists in the `end_padding` logic, it should be:
   > 
   > ```rust
   >     /// End each row at 8-byte word boundary.
   >     pub(crate) fn end_padding(&mut self) {
   >         let payload_width = self.current_width();
   >         self.row_width = round_upto_power_of_2(payload_width, 8);
   >         if self.data.len() < self.row_width {
   >             self.data.resize(self.row_width, 0);
   >         }
   >     }
   > ```
   > 
   > @liukun4515 I cannot quite get your `+8` logic while calculating size in 
the comments above, I didn't get panic with your 110 empty string case either. 
Please correct me if I have missed something important.
   
   Sorry for the error comments with my wrong understanding for row layout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to