Hi, Over the past 4 months, I have been growing more and more frustrated by the amount of undefined behaviour that I am finding and fixing on the Rust implementation. I would like to open the discussion of a broader overview about the problem in light of our current knowledge and what Rust enables as well as offer a solution to the bigger problem.
Just to give you a gist of the seriousness of the issue, the following currently compiles, runs, and is undefined behavior in Rust: let buffer = Buffer::from(&[0i32, 2i32]);let data = ArrayData::new(DataType::Int64, 10, 0, None, 0, vec![buffer], vec![]);let array = Float64Array::from(Arc::new(data)); println!("{:?}", array.value(1)); I would like to propose a major refactor of the crate around physical traits, Buffer, MutableBuffer and ArrayData to make our code type-safe at compile time, thereby avoiding things like the example above from happening again. So far, I was able to reproduce all core features of the arrow crate (nested types, dynamic typing, FFI, memory alignment, performance) by using `Buffer<T: NativeType>` instead of `Buffer` and removing `ArrayData` and RawPointer altogether. Safety-wise, it significantly limits the usage of `unsafe` on higher end APIs, it has a single transmute (the bit chunk iterator one), and a guaranteed safe public API (which is not the case in our master, as shown above). Performance wise, it yields a 1.3x improvement over the current master (after this fix <https://github.com/apache/arrow/pull/9301> of UB on the take kernel, 1.7x prior to it) for the `take` kernel for primitives. I should have other major performance improvements. API wise, it simplifies the traits that we have for memory layout as well as the handling of bitmaps, offsets, etc. The proposal is drafted as a README <https://github.com/jorgecarleitao/arrow2/blob/proposal/README.md> on a repo that I created specifically for this from the ground up, and the full set of changes are in a PR <https://github.com/jorgecarleitao/arrow2/pull/1> so that anyone can view and comment on it. I haven't made any PR to master because this is too large to track as a diff against master, and is beyond the point, anyways. I haven't ported most of the crate as I only tried the non-trivial features (memory layout, bitmaps, FFI, dynamic typing, nested types). I would highly appreciate your thoughts about it. Best, Jorge