thinkharderdev opened a new issue, #15967:
URL: https://github.com/apache/datafusion/issues/15967

   ### Describe the bug
   
   The implementation of `ByteGroupValueBuilder` is unsound as cast back and 
forth between signed and unsigned integer types can lead to a out-of-bounds 
memory access since `ByteGroupValueBuilder::value` uses 
`core::slice::get_unchecked`
   
   ### To Reproduce
   
   This test will segfault if run in release mode
   ```
       #[test]
       fn test_byte_group_value_builder_overflow() {
           let mut builder = 
ByteGroupValueBuilder::<i32>::new(OutputType::Utf8);
           
           let large_string = std::iter::repeat('a').take(1024 * 
1024).collect::<String>();
           
           let array = 
Arc::new(StringArray::from(vec![Some(large_string.as_str())])) as ArrayRef;
           
           // Append items until our buffer length is 1 + i32::MAX as usize 
           for _ in 0..2048 {
               builder.append_val(&array, 0);
           }
   
           assert_eq!(builder.value(2047), large_string.as_bytes());
       }
   ```
   
   ### Expected behavior
   
   Either `ByteGroupValueBuilder::do_append_val_inner` needs to validate that 
`self.buffer.len() <= i32::MAX as usize` or `ByteGroupValueBuilder::value` 
needs to use safe slice access. The former seems like a better option since it 
would panic with a more useful message
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to