https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117088
Bug ID: 117088
Summary: [15 regression] 548.exchange_r regressed by 10% with
-O2 -march=x86-64-v3 after enhance O2 vectorization
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: liuhongt at gcc dot gnu.org
Target Milestone: ---
548.exchange2_r -11.54%
The only regression is from 548.exchange_r, the vectorization for inner loop in
each layer
of the 9-layer loops increases register pressure and causes more spill.
- block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
- block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
.....
- block(rnext:9, 9, i9) = block(rnext:9, 9, i9) + 10
...
- block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
- block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
Looks like aarch64 doesn't have the issue because aarch64 has 32 gprs, but x86
only has 16.
I have a extra patch to prevent loop vectorization in deep-depth loop for x86
backend which can bring the performance back.