http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086

--- Comment #5 from Andrey Belevantsev <abel at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #1)
> ...
> doesn't reorder those is that RA allocates the same register.  With -O3
> -mavx -fselective-scheduling2 the stores are also changed, but we end up
> with a weird:
> .L9:
>         movq    -136(%rbp), %rdx
>         vmovapd (%r9,%rax), %ymm0
>         addq    $1, %rdi
>         vmovapd (%r10,%rax), %ymm8
>         vaddpd  (%rdx,%rax), %ymm0, %ymm0
>         movq    -144(%rbp), %rdx
>         vaddpd  (%rdx,%rax), %ymm8, %ymm9
>         vmovapd %ymm0, (%r9,%rax)
>         vmovapd %ymm8, %ymm0
>         vmovapd %ymm9, %ymm0
>         vmovapd %ymm0, (%r10,%rax)
>         addq    $32, %rax
>         cmpq    %rdi, -152(%rbp)
>         ja      .L9
> Why there is the vmovapd %ymm8, %ymm0 is a mystery, and vmovapd %ymm9, %ymm0
> could be very well merged with the store into vmovapd %ymm9, (%r10,%rax).

That's because we do a renaming and a substitution.  We have (in the middle of
scheduling, just scheduled insn 78):

  262: dx:DI=[bp:DI-0x88]
   72: xmm0:V4DF=[r9:DI+ax:DI]
   78: {di:DI=di:DI+0x1;clobber flags:CC;}   <--- we are here
   73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
   74: [r9:DI+ax:DI]=xmm0:V4DF
   75: xmm0:V4DF=[r10:DI+ax:DI]
  263: dx:DI=[bp:DI-0x90]
   76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
   77: [r10:DI+ax:DI]=xmm0:V4DF

Now we want to schedule insn 75 but xmm0 is busy in 74 and 73, so we rename it
to xmm8 and have:

  262: dx:DI=[bp:DI-0x88]
   72: xmm0:V4DF=[r9:DI+ax:DI]
   78: {di:DI=di:DI+0x1;clobber flags:CC;}
  459: xmm8:V4DF=[r10:DI+ax:DI]              <--- we are here
   73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
   74: [r9:DI+ax:DI]=xmm0:V4DF
  461: xmm0:V4DF=xmm8:V4DF                   <--- copy after renaming 
  263: dx:DI=[bp:DI-0x90]
   76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
   77: [r10:DI+ax:DI]=xmm0:V4DF

Then after scheduling insns 73 and 263 we have

  262: dx:DI=[bp:DI-0x88]
   72: xmm0:V4DF=[r9:DI+ax:DI]
   78: {di:DI=di:DI+0x1;clobber flags:CC;}
  459: xmm8:V4DF=[r10:DI+ax:DI]
   73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
  263: dx:DI=[bp:DI-0x90]                   <--- we are here
   74: [r9:DI+ax:DI]=xmm0:V4DF
  461: xmm0:V4DF=xmm8:V4DF
   76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
   77: [r10:DI+ax:DI]=xmm0:V4DF

and now we want to schedule insn 76.  We substitute its rhs through a copy 461
but then xmm0 is again busy so we rename the target register to xmm9 and get

  262: dx:DI=[bp:DI-0x88]
   72: xmm0:V4DF=[r9:DI+ax:DI]
   78: {di:DI=di:DI+0x1;clobber flags:CC;}
  459: xmm8:V4DF=[r10:DI+ax:DI]
   73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI]
  263: dx:DI=[bp:DI-0x90]
  464: xmm9:V4DF=xmm8:V4DF+[dx:DI+ax:DI]    <--- new renamed insn
   74: [r9:DI+ax:DI]=xmm0:V4DF
  461: xmm0:V4DF=xmm8:V4DF
  466: xmm0:V4DF=xmm9:V4DF                  <--- copy after renaming
   77: [r10:DI+ax:DI]=xmm0:V4DF


At this point insn 461 is dead but we do not notice, and it doesn't look easy. 
I think there was some suggestion in the original research for killing dead
insn copies left after renaming but I don't remember offhand.

Reply via email to