https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125876
--- Comment #2 from Sarvesh Chandra <Sarvesh.Chandra at amd dot com> --- https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604649.html We saw the following comment from Uros that mentions: >> -(define_expand "avx512f_movddup512<mask_name>" >> - [(set (match_operand:V8DF 0 "register_operand") >> +(define_insn "avx512f_movddup512<mask_name>" >> + [(set (match_operand:V8DF 0 "register_operand" "=v") >> (vec_select:V8DF >> (vec_concat:V16DF >> - (match_operand:V8DF 1 "nonimmediate_operand") >> + (match_operand:V8DF 1 "memory_operand" "m") > >I think you should leave nonimmediate_operand here with "m" predicate. >Reload is able to move the register to the memory, and it is >beneficial to allow registers for possible combine opportunities. Essentially we have a duplicate pattern for avx512f_movddup512 and avx512f_unpcklpd512 in the case of even lane interleaving with matching operands (both registers). We could keep the constraint on avx512f_movddup512 same as current, and allow avx512f_unpcklpd512 to match the case of matching register operands. Matching memory operands will fall through to avx512f_movddup512, one of the operands will be spilled.
