https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101207
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Ah, so what happens is that we elide the load permutation that feeds the plus reduction originally but then we vectorize the live operands of the minus reduction as BIT_FIELD_REFs ending up extracting the wrong lanes. Testcase for x86_64: /* { dg-additional-options "-ftree-slp-vectorize -ffast-math" } */ double a[2]; double x, y; void __attribute__((noipa)) foo () { x = a[1] - a[0]; y = a[0] + a[1]; } int main() { a[0] = 0.; a[1] = 1.; foo (); if (x != 1. || y != 1.) __builtin_abort (); return 0; }