[Bug rtl-optimization/107892] Unnecessary move between ymm registers in loop using AVX2 intrinsic

crazylht at gmail dot com via Gcc-bugs Mon, 28 Nov 2022 00:43:26 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892


--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---

> In the bad version, I noticed that the RTL initially has two separate insns
> for 'a += *p': one to do the addition and write the result to a new pseudo
> register, and one to convert the value from mode V8SI to V4DI and assign it
Because we're defining __m256i as __v4di, and rtl use subreg to bitcast __v8si
reg to __v4di one.
> to the original pseudo register.  These two separate insns never get
> combined.  (That sort of explains why the bug isn't seen with the __v8si and
> += method; gcc doesn't do a type conversion with that method.)  So, I'm
Combine failed to combine them because the __v8si reg is also used outside of
the loop.
> wondering if the bug is in the instruction combining pass.  Or perhaps the
> RTL should never have had two separate insns in the first place?

[Bug rtl-optimization/107892] Unnecessary move between ymm registers in loop using AVX2 intrinsic

Reply via email to