https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- > In the bad version, I noticed that the RTL initially has two separate insns > for 'a += *p': one to do the addition and write the result to a new pseudo > register, and one to convert the value from mode V8SI to V4DI and assign it Because we're defining __m256i as __v4di, and rtl use subreg to bitcast __v8si reg to __v4di one. > to the original pseudo register. These two separate insns never get > combined. (That sort of explains why the bug isn't seen with the __v8si and > += method; gcc doesn't do a type conversion with that method.) So, I'm Combine failed to combine them because the __v8si reg is also used outside of the loop. > wondering if the bug is in the instruction combining pass. Or perhaps the > RTL should never have had two separate insns in the first place?