https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
--- Comment #13 from Jeffrey A. Law <law at gcc dot gnu.org> --- So just recording some thoughts as I initially look at this. This is the key insn (from the combine dump). (insn 35 34 37 2 (set (reg:RVVM8QI 112 v16) (if_then_else:RVVM8QI (unspec:RVVM1BI [ (reg:RVVM1BI 145 [ _17 ]) (const_int 0 [0]) (const_int 2 [0x2]) (const_int 0 [0]) repeated x2 (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (minus:RVVM8QI (reg:RVVM8QI 135 [ _5 ]) (reg:RVVM8QI 143 [ _13 ])) (reg:RVVM8QI 143 [ _13 ]))) "j.C":72:17 discrim 1 5590 {pred_subrvvm8qi} (expr_list:REG_DEAD (reg:SI 66 vl) (expr_list:REG_DEAD (reg:SI 67 vtype) (nil)))) The first thing that popped out to me was the hard register usage for the output. That's because the output is used as an argument. That led me to realize that v8 is also holding live data. v0, of course, will be needed for the mask. So to get a successful allocation with no reloading we'd have to have tie one of the inputs with the output. What we get instead from allocation is r135 in memory and r159 in v24. All downhill from there.