https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105363

            Bug ID: 105363
           Summary: -ftree-slp-vectorize decreases performance
                    significantly (x64)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mtzguido at gmail dot com
  Target Milestone: ---

Created attachment 52857
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52857&action=edit
Source file and outputs

Hello,

I found this example where using -O2 (which implies -ftree-slp-vectorize)
decreases performance by about 4x wrt -O1. I've pinned it down to the
-ftree-slp-vectorize, and -O3 -fno-tree-slp-vectorize works very well.

   $ gcc bug_opt.c -O3 -o bug_opt-O3
   $ time ./bug_opt-O3

   real 0m6.627s
   user 0m6.619s
   sys  0m0.005s

   $ gcc bug_opt.c -O3 -fno-tree-slp-vectorize -o bug_opt-O3-novec
   $ time ./bug_opt-O3-novec

   real 0m1.703s
   user 0m1.701s
   sys  0m0.000s

I've verified this with the current HEAD (1ceddd7497) and with 11.2 (though in
that version -O2 does not imply -ftree-slp-vectorize, so the problem starts to
appear at -O3).

I've minimized the example into a pretty basic insertion sort.

I have not checked the generated assembly.

I'm attaching the .c source, which has some more comments with timings. Also
attaching my /proc/cpuinfo, and the temp files generated with -O3. I imagine
the .o and binary is not too helpful, but can send them if needed.

Thanks,
Guido

Reply via email to