On 2/7/24 06:48, Alexander Monakov wrote:
Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations).
Ah, that answers my question re 128 vs 256 byte minimum. So as far as this patch goes, Reviewed-by: Richard Henderson <richard.hender...@linaro.org> r~