http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086
--- Comment #5 from Andrey Belevantsev <abel at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #1) > ... > doesn't reorder those is that RA allocates the same register. With -O3 > -mavx -fselective-scheduling2 the stores are also changed, but we end up > with a weird: > .L9: > movq -136(%rbp), %rdx > vmovapd (%r9,%rax), %ymm0 > addq $1, %rdi > vmovapd (%r10,%rax), %ymm8 > vaddpd (%rdx,%rax), %ymm0, %ymm0 > movq -144(%rbp), %rdx > vaddpd (%rdx,%rax), %ymm8, %ymm9 > vmovapd %ymm0, (%r9,%rax) > vmovapd %ymm8, %ymm0 > vmovapd %ymm9, %ymm0 > vmovapd %ymm0, (%r10,%rax) > addq $32, %rax > cmpq %rdi, -152(%rbp) > ja .L9 > Why there is the vmovapd %ymm8, %ymm0 is a mystery, and vmovapd %ymm9, %ymm0 > could be very well merged with the store into vmovapd %ymm9, (%r10,%rax). That's because we do a renaming and a substitution. We have (in the middle of scheduling, just scheduled insn 78): 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} <--- we are here 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 74: [r9:DI+ax:DI]=xmm0:V4DF 75: xmm0:V4DF=[r10:DI+ax:DI] 263: dx:DI=[bp:DI-0x90] 76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 77: [r10:DI+ax:DI]=xmm0:V4DF Now we want to schedule insn 75 but xmm0 is busy in 74 and 73, so we rename it to xmm8 and have: 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} 459: xmm8:V4DF=[r10:DI+ax:DI] <--- we are here 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 74: [r9:DI+ax:DI]=xmm0:V4DF 461: xmm0:V4DF=xmm8:V4DF <--- copy after renaming 263: dx:DI=[bp:DI-0x90] 76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 77: [r10:DI+ax:DI]=xmm0:V4DF Then after scheduling insns 73 and 263 we have 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} 459: xmm8:V4DF=[r10:DI+ax:DI] 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 263: dx:DI=[bp:DI-0x90] <--- we are here 74: [r9:DI+ax:DI]=xmm0:V4DF 461: xmm0:V4DF=xmm8:V4DF 76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 77: [r10:DI+ax:DI]=xmm0:V4DF and now we want to schedule insn 76. We substitute its rhs through a copy 461 but then xmm0 is again busy so we rename the target register to xmm9 and get 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} 459: xmm8:V4DF=[r10:DI+ax:DI] 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 263: dx:DI=[bp:DI-0x90] 464: xmm9:V4DF=xmm8:V4DF+[dx:DI+ax:DI] <--- new renamed insn 74: [r9:DI+ax:DI]=xmm0:V4DF 461: xmm0:V4DF=xmm8:V4DF 466: xmm0:V4DF=xmm9:V4DF <--- copy after renaming 77: [r10:DI+ax:DI]=xmm0:V4DF At this point insn 461 is dead but we do not notice, and it doesn't look easy. I think there was some suggestion in the original research for killing dead insn copies left after renaming but I don't remember offhand.