While investigating an issue regarding offset handling in the rust
arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583),
I started to wonder how the other implementations are handling compute
on buffer slices.

The rust implementation currently allows creating slices of arrays
starting at arbitrary aligned offsets. This becomes a problem with
boolean arrays and with the null bitmaps, since operations on those
are currently working with whole bytes as the smallest unit. There
could be several options to solve this, all adding additional
complexity or having other downsides:

- calculate null bitmaps bit by bit if not properly aligned, leading
to a big performance drop
- calculate null bitmaps on whole bytes and then try to rotate the
resulting buffer by a certain number of bits. quite complex code and
also some performance overhead
- disallow compute kernels on non-aligned buffers, at least if null
bitmaps are involved

I'm leaning towards the last option, a draft PR is at
https://github.com/apache/arrow/pull/7854

Another issue with offsets is that, at least in the rust
implementation, some simd kernels currently assume the whole buffer to
be aligned to 64 bytes. As soon as there is an offset that is not a
multiple of 64, this could lead to unsafe out of bounds reads and
writes of memory.

I'm very interested in how the C++ and Java implementations handle those issues.

-- 
Jörn Horstmann | Senior Backend Engineer

www.signavio.com
Kurfürstenstraße 111, 10787 Berlin, Germany

Reply via email to