https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124377

            Bug ID: 124377
           Summary: 100% slowdown of the s481 benchmark from TSVC on
                    aarch64
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, needs-bisection
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pheeck at gcc dot gnu.org
                CC: tnfchris at gcc dot gnu.org
  Target Milestone: ---
              Host: aarch64-gnu-linux
            Target: aarch64-gnu-linux

Created attachment 63838
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63838&action=edit
Just the s481 benchmark with 10x the number of iterations

Between

r16-7750-g83ef3db4b388e7
r16-7810-gd03af25c8978c1

the benchmark s481 from TSVC (https://github.com/UoB-HPC/TSVC_2) got 2x slower
(execution time).

https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=356.933.0

I attach the benchmark extracted from the TSVC suite with 10x the number of
iterations so that the slowdown is more visible.

To reproduce the slowdown, do
gcc -O2 testcase.c dummy.c
time ./a.out
on an aarch64 machine.  I've seen the slowdown on an Ampere Altra Neoverse N1
machine and on an Nvidia Grace Neoverse V2 machine.

Since TSVC is a vectorizing compiler benchmark suite, I expect the problem is
in vectorizer.  I plan to take a look at the assembly output for r16-7750 and
r16-7810, but didn't find the time today and wanted to fill out the bug report
sooner rather than later.

There are also slowdowns of s482 and s332.  Those will probably have the same
cause.  I've not look into them yet.
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=356.912.0
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=356.934.0

Reply via email to