https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86625

--- Comment #2 from Chris Elrod <elrodc at gmail dot com> ---
Created attachment 44418
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44418&action=edit
Code to reproduce slow vectorization pattern and unnecessary loads & stores

(Sorry if this goes to the bottom instead of top, trying to attach a file in
place of a link, but I can't edit the old comment.)

Attached is sample code to reproduce the problem in gcc 8.1.1
As observed by amonakov, compiling with -O3/-Ofast reproduces the full problem,
eg:

gfortran -Ofast -march=skylake-avx512 -mprefer-vector-width=512 -funroll-loops
-S kernels.f90 -o kernels.s

Compiling with -O3 -fdisable-tree-cunrolli or -O2 -ftree-vectorize fixes the
incorrect vectorization pattern, but leave a lot of unnecessary broadcast loads
and stores.

Reply via email to