thinkharderdev opened a new issue, #15967:
URL: https://github.com/apache/datafusion/issues/15967
### Describe the bug
The implementation of `ByteGroupValueBuilder` is unsound as cast back and
forth between signed and unsigned integer types can lead to a out-of-bounds
memory access since `ByteGroupValueBuilder::value` uses
`core::slice::get_unchecked`
### To Reproduce
This test will segfault if run in release mode
```
#[test]
fn test_byte_group_value_builder_overflow() {
let mut builder =
ByteGroupValueBuilder::<i32>::new(OutputType::Utf8);
let large_string = std::iter::repeat('a').take(1024 *
1024).collect::<String>();
let array =
Arc::new(StringArray::from(vec![Some(large_string.as_str())])) as ArrayRef;
// Append items until our buffer length is 1 + i32::MAX as usize
for _ in 0..2048 {
builder.append_val(&array, 0);
}
assert_eq!(builder.value(2047), large_string.as_bytes());
}
```
### Expected behavior
Either `ByteGroupValueBuilder::do_append_val_inner` needs to validate that
`self.buffer.len() <= i32::MAX as usize` or `ByteGroupValueBuilder::value`
needs to use safe slice access. The former seems like a better option since it
would panic with a more useful message
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]