https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912
--- Comment #5 from Erik Schnetter <schnetter at gmail dot com> --- As you suggested, the problem is probably not caused by register spills, but by stores into a struct that are not optimized away. In this case, the respective struct elements are unused in the code. I traced the results of the first __builtin_ia32_maskloadpd256: _63940 = __builtin_ia32_maskloadpd256 (_63955, prephitmp_86203); MEM <const vector(4) double> [(struct mat3 *)&vars + 992B] = _63940; _178613 = .FMA (_63940, _64752, _178609); MEM <const vector(4) double> [(struct mat3 *)&vars + 1312B] = _63940; The respective struct locations (+ 992B, + 1312B) are indeed not used anywhere else. The struct is of type z4c_vars. It (and its parent) are defined in lines 279837 to 280818. It is large. Is there e.g. a parameter I could set to make GCC try harder avoid unnecessary stores?
