https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105363
Bug ID: 105363 Summary: -ftree-slp-vectorize decreases performance significantly (x64) Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mtzguido at gmail dot com Target Milestone: --- Created attachment 52857 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52857&action=edit Source file and outputs Hello, I found this example where using -O2 (which implies -ftree-slp-vectorize) decreases performance by about 4x wrt -O1. I've pinned it down to the -ftree-slp-vectorize, and -O3 -fno-tree-slp-vectorize works very well. $ gcc bug_opt.c -O3 -o bug_opt-O3 $ time ./bug_opt-O3 real 0m6.627s user 0m6.619s sys 0m0.005s $ gcc bug_opt.c -O3 -fno-tree-slp-vectorize -o bug_opt-O3-novec $ time ./bug_opt-O3-novec real 0m1.703s user 0m1.701s sys 0m0.000s I've verified this with the current HEAD (1ceddd7497) and with 11.2 (though in that version -O2 does not imply -ftree-slp-vectorize, so the problem starts to appear at -O3). I've minimized the example into a pretty basic insertion sort. I have not checked the generated assembly. I'm attaching the .c source, which has some more comments with timings. Also attaching my /proc/cpuinfo, and the temp files generated with -O3. I imagine the .o and binary is not too helpful, but can send them if needed. Thanks, Guido