jorgecarleitao opened a new pull request #9492:
URL: https://github.com/apache/arrow/pull/9492
This PR is derived from the observation that creating arrays from
`ArrayData` can result in undefined behavior. This is because we:
1. transmute buffers to whatever type the concrete arrays are derived from;
2. rely on `ArrayData::len` to create slices (in primitive arrays) and check
bounds (in arrays with offset buffers).
Example:
```rust
#[test]
fn test_bla() {
let data = ArrayData::new(DataType::Int32, 1000, None, None, 0,
vec![Buffer::from(&[1i32])], vec![]);
let array = unsafe { make_array(Arc::new(data)) };
let array = array.as_any().downcast_ref::<Int32Array>().unwrap();
let slice = array.values();
println!("{}", slice[150]); // we never allocated this...
}
```
Therefore, the function `make_array` must be marked as `unsafe`.
Note that all `from(ArrayDataRef)` in concrete implementations are also
`unsafe` and not marked as such (e.g. Int32Array::from(ArrayDataRef)`. Since
they are implemented via the trait `From`, which does not support the `unsafe`
keyword on it, we will need to re-name all those functions accordingly (as they
collide with another function `from`). That is a larger change.
The correct way of addressing this atm is to perform all the validations
necessary on `From<ArrayData>`.
Anyone relying on `make_array` and `From<>` can unknowingly cause out of
bound reads, which is a typical source of an escalation of privileges via reads
to the applications' memory, as the example above shows.
@alamb , my suggestion here is that we start by marking any method currently
`unsafe` as such, so that we fix problem
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]