https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66740

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Can you still reproduce it?  I don't see anything wrong on the dumps I've
looked at, without -Ofast of course the order of the floating point arithmetics
is significantly different between -fopenmp and -fno-openmp - but that is to be
expected, you've asked for that.  So, for the iterations that are vectorized,
each SIMD lane has its own sum variable (with the given options the loop seems
to be vectorized with vectorization factor 8, so there are 8 SIMD lanes and
thus 8 separate sum vars), the vector version sums up into those (initialized
with 0), any scalar iterations sum into the first SIMD lane's sum var and
finally at the end the 8 partial sums are summed together (one by one, rather
than what vectorizer normally does for -Ofast reduce them by summing up 4 x 2
numbers, then 2 x 2 numbers, then 2 numbers.
If this is still a problem, can you cook up a small self-contained testcase out
of it (small function with just the #pragma omp simd loop in it, taking the
args as parameters, with noinline/noclone attribute on it ideally, and then
main that
fills up an array with the problematic input values and then checks what the
function returned (sum))?

Reply via email to