https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122746

--- Comment #3 from vekumar at gcc dot gnu.org ---
I used -O3 -march=znver4 and saw extracts.
 https://godbolt.org/z/5ajq8rjTr

with -O3 -march-znver5 
 https://godbolt.org/z/Exd3bT3T7

Current trunk code is still bad

GCC 15 
Chain 1 (xmm1 - array a):  Load → Add → Load → Add → Load → Add ...
Chain 2 (xmm0 - array b):  Load → Add → Load → Add → Load → Add ...

GCC 16
Single Chain (xmm0): Build → Add → Build → Add → Build → Add ...

Remainder loop is masked in GCC 16 and has costly permutes.

Reply via email to