jorgecarleitao commented on a change in pull request #9016:
URL: https://github.com/apache/arrow/pull/9016#discussion_r550416288
##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -126,25 +126,28 @@ impl<OffsetSize: StringOffsetSizeTrait>
GenericStringArray<OffsetSize> {
}
pub(crate) fn from_vec(v: Vec<&str>) -> Self {
- let mut offsets = Vec::with_capacity(v.len() + 1);
- let mut values = Vec::new();
+ let mut offsets =
+ MutableBuffer::new((v.len() + 1) *
std::mem::size_of::<OffsetSize>());
+ let mut values = MutableBuffer::new(0);
Review comment:
I.e. the general direction that I am pushing towards is to stop using
`Vec` and replace it by `MutableBuffer` to avoid extra allocations. Once that
is in place, we can replace these by `iter_mut`.
> also current conversion to a byte slice may add some overhead?
I believe so, as `to_byte_slice` returns a slice of unknown size, both for
`&T` and `&[T]`. Either the compiler optimizes it, or there is an extra cost. I
benched a 10% diff in building buffers when introduced a method that was not
doing bound checks on these. IMO we should be using `to_le_bytes`, and have a
method `MutableBuffer::push<ToByteSlice>`. I think we need the crate
`byteorder` because AFAIK std's `to_le_bytes` does not have a trait (which we
need).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]