On Sun, Nov 4, 2018 at 11:45 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Sun, Nov 4, 2018 at 8:17 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Sun, Nov 4, 2018 at 8:41 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Fri, Nov 2, 2018 at 6:25 PM H.J. Lu <hongjiu...@intel.com> wrote:
> > > >
> > > > Remove duplicated AVX2/AVX512 vec_dup patterns and replace them with
> > > > subreg.  gcc.target/i386/avx2-vbroadcastss_ps256-1.c is changed by
> > > >
> > > >  avx2_test:
> > > >         .cfi_startproc
> > > > -       vmovaps x(%rip), %xmm1
> > > > -       vbroadcastss    %xmm1, %ymm0
> > > > +       vbroadcastss    x(%rip), %ymm0
> > > >         vmovaps %ymm0, y(%rip)
> > > >         vzeroupper
> > > >         ret
> > > >         .cfi_endproc
> > > >
> > > > gcc.target/i386/avx512vl-vbroadcast-3.c is changed by
> > > >
> > > > @@ -113,7 +113,7 @@ f10:
> > > >         .cfi_startproc
> > > >         vmovaps %ymm0, %ymm16
> > > >         vpermilps       $85, %ymm16, %ymm16
> > > > -       vbroadcastss    %xmm16, %ymm16
> > > > +       vshuff32x4      $0x0, %ymm16, %ymm16, %ymm16
> > > >         vzeroupper
> > > >         ret
> > > >         .cfi_endproc
> > > > @@ -153,8 +153,7 @@ f12:
> > > >  f13:
> > > >  .LFB12:
> > > >         .cfi_startproc
> > > > -       vmovaps (%rdi), %ymm16
> > > > -       vbroadcastss    %xmm16, %ymm16
> > > > +       vbroadcastss    (%rdi), %ymm16
> > > >         vzeroupper
> > > >         ret
> > > >         .cfi_endproc
> > >
> > > Actually, we can achieve the same with pre-reload splitters. Please
> > > see the attached patch for a couple of examples and a fix for
> > > vbroadcastss that accesses the memory in wrong mode.
> > >
> >
> > My patch removes a bunch of duplicated patterns from sse.md.  But
> > yours adds a couple more patterns.   Isn't fewer patterns preferred?
>
> Playing SUBREG games before reload does not look safe to me. We would

There are plenty of SUBREG usage in i386 backend before preload.  It is
perfectly safe to do so as long as we don't create SUBREG with a different
register class from the base.  Do you have a testcase to show my SUBREG
usage is unsafe?

> like to create a simpler instruction out of the combination of vector
> load and broadcast, so I think that combine+split is the right tool
> for this simplification.

Adding new patterns doesn't simplify the issue.

> BTW: Half of my proposed patch is a fix to a avx2_pbroadcast<mode>{_1}
> pattern, which models wrong access to memory.
>

I will take look at avx2_pbroadcast<mode>{_1}.


-- 
H.J.

Reply via email to