"Bingfeng Mei" <b...@broadcom.com> wrote on 01/11/2011 01:25:14 PM:

> Ira,
> Thank you very much for quick answer. I will check 4.7 x86-64
> to see difference from our port. Is there significant change
> between 4.5 & 4.7 regarding SLP?

Yes, I think so. 4.5 can't SLP data accesses with unknown alignment that
you have here.

Ira

>
> Cheers,
> Bingfeng
>
> > -----Original Message-----
> > From: Ira Rosen [mailto:i...@il.ibm.com]
> > Sent: 01 November 2011 11:13
> > To: Bingfeng Mei
> > Cc: gcc@gcc.gnu.org
> > Subject: Re: SLP vectorizer on non-loop?
> >
> >
> >
> > gcc-ow...@gcc.gnu.org wrote on 01/11/2011 12:41:32 PM:
> >
> > > Hello,
> > > I have one example with two very similar loops. cunrolli pass
> > > unrolls one loop completely
> > > but not the other based on slightly different cost estimations. The
> > > not-unrolled loop
> > > get SLP-vectorized, then unrolled by "cunroll" pass, whereas the
> > > other unrolled loop cannot
> > > be vectorized since it is not a loop any more.  In the end, there is
> > > big difference of
> > > performance between two loops.
> > >
> >
> > Here what I see with the current trunk on x86_64 with -O3 (with the two
> > loops split into different functions):
> >
> > The first loop, the one that doesn't get unrolled by cunrolli, gets
> > loop
> > vectorized with -fno-vect-cost-model. With the cost model the
> > vectorization
> > fails because the number of iterations is not sufficient (the
> > vectorizer
> > tries to apply loop peeling in order to align the accesses), the loop
> > gets
> > later unrolled by cunroll and the basic block gets vectorized by SLP.
> >
> > The second loop, unrolled by cunrolli, also gets vectorized by SLP.
> >
> > The *.optimized dumps look similar:
> >
> >
> > <bb 2>:
> >   vect_var_.14_48 = MEM[(int *)p_hist_buff_9(D)];
> >   MEM[(int *)temp_hist_buffer_5(D)] = vect_var_.14_48;
> >   return;
> >
> >
> > <bb 2>:
> >   vect_var_.7_57 = MEM[(int *)p_input_10(D)];
> >   MEM[(int *)temp_hist_buffer_6(D) + 16B] = vect_var_.7_57;
> >   return;
> >
> >
> > > My question is why SLP vectorization has to be performed on loop (it
> > > is a sub-pass under
> > > pass_tree_loop). Conceptually, cannot it be done on any basic block?
> > > Our port are still
> > > stuck at 4.5. But I checked 4.7, it seems still the same. I also
> > > checked functions in
> > > tree-vect-slp.c. They use a lot of loop_vinfo structures. But in
> > > some places it checks
> > > whether loop_vinfo exists to use it or other alternative. I tried to
> > > add an extra SLP
> > > pass after pass_tree_loop, but it didn't work. I wonder how easy to
> > > make SLP works for
> > > non-loop.
> >
> > SLP vectorization works both on loops (in vectorize pass) and on basic
> > blocks (in slp-vectorize pass).
> >
> > Ira
> >
> > >
> > > Thanks,
> > > Bingfeng Mei
> > >
> > > Broadcom UK
> > >
> > > void foo (int *__restrict__ temp_hist_buffer,
> > >           int * __restrict__ p_hist_buff,
> > >           int *__restrict__ p_input)
> > > {
> > >   int i;
> > >   for(i=0;i<4;i++)
> > >      temp_hist_buffer[i]=p_hist_buff[i];
> > >
> > >   for(i=0;i<4;i++)
> > >      temp_hist_buffer[i+4]=p_input[i];
> > >
> > > }
> > >
> > >
> >
>
>

Reply via email to