On 2/7/24 06:48, Alexander Monakov wrote:
Increase unroll factor in SIMD loops from 4x to 8x in order to move
their bottlenecks from ALU port contention to load issue rate (two loads
per cycle on popular x86 implementations).

Ah, that answers my question re 128 vs 256 byte minimum.

So as far as this patch goes,
Reviewed-by: Richard Henderson <richard.hender...@linaro.org>


r~

Reply via email to