I'm finding HEAPS of SIMD functions want to return pairs (unpacks inparticular): int4 (low, hight) = unpack(someShort8); Currently I have to duplicate everyting: int4 low = unpackLow(someShort8); int4 high = unpackHigh(someShort8); I'm getting really sick of that, it feels so... last millennium.
It can also be realy inefficient. For example ARM NEON has vzip instruction that is used like this:
vzip.32 q0, q1This will interleave elements of vectors in q0 and q1 in one instruction.