On Tue, 12 Jul 2016, Richard Biener wrote:

> On Tue, 12 Jul 2016, Uros Bizjak wrote:
> 
> > On Tue, Jul 12, 2016 at 10:58 AM, Richard Biener <rguent...@suse.de> wrote:
> > > On Sun, 10 Jul 2016, Uros Bizjak wrote:
> > >
> > >> On Wed, Jul 6, 2016 at 3:18 PM, Richard Biener <rguent...@suse.de> wrote:
> > >>
> > >> >> > 2016-07-04  Richard Biener  <rguent...@suse.de>
> > >> >> >
> > >> >> >     PR rtl-optimization/68961
> > >> >> >     * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and 
> > >> >> > CONCAT
> > >> >> >     to simplify to a non-constant.
> > >> >> >
> > >> >> >     * gcc.target/i386/pr68961.c: New testcase.
> > >> >>
> > >> >> Thanks, LGTM.
> > >> >
> > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, it causes
> > >> >
> > >> > FAIL: gcc.target/i386/sse2-load-multi.c scan-assembler-times movup 2
> > >> >
> > >> > as the peephole created for that testcase no longer applies as fwprop
> > >> > does
> > >> >
> > >> > In insn 10, replacing
> > >> >  (vec_concat:V2DF (vec_select:DF (reg:V2DF 91)
> > >> >             (parallel [
> > >> >                     (const_int 0 [0])
> > >> >                 ]))
> > >> >         (mem:DF (reg/f:DI 95) [0  S8 A128]))
> > >> >  with (vec_concat:V2DF (reg:DF 93 [ MEM[(const double *)&a + 8B] ])
> > >> >         (mem:DF (reg/f:DI 95) [0  S8 A128]))
> > >> > Changed insn 10
> > >> >
> > >> > resulting in
> > >> >
> > >> >         movsd   a+8(%rip), %xmm0
> > >> >         movhpd  a+16(%rip), %xmm0
> > >> >
> > >> > again rather than movupd.
> > >> >
> > >> > Uros, there is probably a missing peephole for the new form - can you
> > >> > fix this as a followup or should I hold on this patch for a bit longer?
> > >>
> > >> No, please proceed with the patch, I'll fix this fallout with a
> > >> followup patch in a couple of days.
> > >
> > > Applied as r238238.  Is the following x86 change ok then which
> > > adjusts the vectorizer vector construction cost to sth more sensible?
> > > I have adjusted the generic implementation in targhooks.c this way
> > > a few weeks ago already.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2016-07-12  Richard Biener  <rguent...@suse.de>
> > >
> > >         * targhooks.c (default_builtin_vectorization_cost): Adjust
> > >         vec_construct cost.
> > >         * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> > 
> > Looks OK to me, but let's also give Intel chance to comment.
> 
> Btw, the motivation is that the cost of large initializers like for
> v16qi or v32qi is underestimated currently.  You end up with
> 15 or 31 vinsert calls (or similar with other ISAs) and you can't do
> better than elements - 1 operations.  It doesn't really matter
> for smaller vectors of course (seen for CPU v6 x264)

I've applied the patch now given no further comments from Intel.

Richard.

Reply via email to