https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
cbaylis at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cbaylis at gcc dot gnu.org --- Comment #4 from cbaylis at gcc dot gnu.org --- A much simplified test case based on arm_neon_excessive_vmov_wo_vcombine.c $ arm-unknown-linux-gnueabihf-gcc -O2 -S -o - -mfpu=neon mini.c #include <arm_neon.h> void f(int8_t *p) { int8x16_t v; int8x8_t v2; int8x8x2_t vx; v=vld1q_s8(p); v2=vld1_s8(p); vx.val[0] = vget_low_s8(v); vx.val[1] = vget_high_s8(v); v2 = vtbl2_s8(vx, v2); vst1_s8(p, v2); } With -dp, the generated code is: f: vld1.8 {d18-d19}, [r0] @ 6 neon_vld1v16qi [length = 4] vmov d16, d18 @ v8qi @ 10 *neon_movv8qi/1 [length = 4] vld1.8 {d20}, [r0] @ 7 neon_vld1v8qi [length = 4] vmov d17, d19 @ v8qi @ 11 *neon_movv8qi/1 [length = 4] vtbl.8 d16, {d16, d17}, d20 @ 12 neon_vtbl2v8qi [length = 4] vst1.8 {d16}, [r0] @ 13 neon_vst1v8qi [length = 4] bx lr @ 24 *thumb2_return [length = 4] By the time IRA runs, the insns which result in the moves look like this: (insn 9 18 11 2 (set (subreg:V8QI (reg/v:TI 116 [ vx ]) 0) (subreg:V8QI (reg:V16QI 114 [ D.14019 ]) 0)) /tmp/mini.c:11 827 {*neon_movv8qi} The registers 116 and 114 are allocated to different hard registers, as they conflict. Presumably, the register allocator could be taught to treat this subreg->subreg move as a copy and allow the same hard register to be allocated.