https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115002
Bug ID: 115002
Summary: wide integer vector performance regression, x86,
between gcc-14 and gcc-13 using target clones on
skylake platform
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: colin.king at intel dot com
Target Milestone: ---
Created attachment 58138
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58138&action=edit
reproducer source code
I'm seeing a ~1.5% performance regression in gcc-14 compared to gcc-13, using
gcc on Ubuntu 24.04:
Versions:
gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4)
gcc version 14.0.1 20240412 (experimental) [master r14-9935-g67e1433a94f]
(Ubuntu 14-20240412-0ubuntu1)
CFLAGS="" gcc-13 reproducer-vecwide.c -O2 -Wall
cking@skylake:~$ ./a.out
7615.58 vint8w2048_t ops per sec, duration = 13.13 secs
cking@skylake:~$ CFLAGS="" gcc-14 reproducer-vecwide.c -O2 -Wall
cking@skylake:~$ ./a.out
7489.42 vint8w2048_t ops per sec, duration = 13.35 secs
The original issue appeared when regression testing stress-ng vecwide stressor
[1]. I've managed to extract the attached reproducer from the original code
(see attached).
Salient point to focus on:
1. The issue is also dependant on the TARGET_CLONES macro being defined as
__attribute__((target_clones("avx,default"))) - the avx target clones seems to
be an issue in reproducing this problem.
Attached are the reproducer C source and disassembled object code.
References: [1]
https://github.com/ColinIanKing/stress-ng/blob/master/stress-vecwide.c