https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124377
Bug ID: 124377
Summary: 100% slowdown of the s481 benchmark from TSVC on
aarch64
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization, needs-bisection
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pheeck at gcc dot gnu.org
CC: tnfchris at gcc dot gnu.org
Target Milestone: ---
Host: aarch64-gnu-linux
Target: aarch64-gnu-linux
Created attachment 63838
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63838&action=edit
Just the s481 benchmark with 10x the number of iterations
Between
r16-7750-g83ef3db4b388e7
r16-7810-gd03af25c8978c1
the benchmark s481 from TSVC (https://github.com/UoB-HPC/TSVC_2) got 2x slower
(execution time).
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=356.933.0
I attach the benchmark extracted from the TSVC suite with 10x the number of
iterations so that the slowdown is more visible.
To reproduce the slowdown, do
gcc -O2 testcase.c dummy.c
time ./a.out
on an aarch64 machine. I've seen the slowdown on an Ampere Altra Neoverse N1
machine and on an Nvidia Grace Neoverse V2 machine.
Since TSVC is a vectorizing compiler benchmark suite, I expect the problem is
in vectorizer. I plan to take a look at the assembly output for r16-7750 and
r16-7810, but didn't find the time today and wanted to fill out the bug report
sooner rather than later.
There are also slowdowns of s482 and s332. Those will probably have the same
cause. I've not look into them yet.
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=356.912.0
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=356.934.0