Jorge Cardoso Leitão <jorgecarlei...@gmail.com> writes:

> Yes, I expect aligned SIMD loads to be faster.
>
> My understanding is that we do not need an alignment requirement for this,
> though: split the buffer in 3, [unaligned][aligned][unaligned], use aligned
> loads for the middle and un-aligned (or not even SIMD) for the prefix and
> suffix. This is generic over the size of the SIMD and buffer slicing, where
> alignment can be lost. Or am I missing something?

If you add two arrays with different alignment. The [aligned] portions don't 
"line up" so you're always pulling unaligned from one of the arrays. This 
interaction between arrays is usually the rationale when HPC software decides 
to specify alignment. It may not be "worth it" to Arrow. If you have a high 
arithmetic intensity operation, you can afford to pack into aligned tiles (all 
GEMM-type implementations do this).

Reply via email to