On Tue, Jul 12, 2016 at 10:58 AM, Richard Biener <rguent...@suse.de> wrote: > On Sun, 10 Jul 2016, Uros Bizjak wrote: > >> On Wed, Jul 6, 2016 at 3:18 PM, Richard Biener <rguent...@suse.de> wrote: >> >> >> > 2016-07-04 Richard Biener <rguent...@suse.de> >> >> > >> >> > PR rtl-optimization/68961 >> >> > * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT >> >> > to simplify to a non-constant. >> >> > >> >> > * gcc.target/i386/pr68961.c: New testcase. >> >> >> >> Thanks, LGTM. >> > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, it causes >> > >> > FAIL: gcc.target/i386/sse2-load-multi.c scan-assembler-times movup 2 >> > >> > as the peephole created for that testcase no longer applies as fwprop >> > does >> > >> > In insn 10, replacing >> > (vec_concat:V2DF (vec_select:DF (reg:V2DF 91) >> > (parallel [ >> > (const_int 0 [0]) >> > ])) >> > (mem:DF (reg/f:DI 95) [0 S8 A128])) >> > with (vec_concat:V2DF (reg:DF 93 [ MEM[(const double *)&a + 8B] ]) >> > (mem:DF (reg/f:DI 95) [0 S8 A128])) >> > Changed insn 10 >> > >> > resulting in >> > >> > movsd a+8(%rip), %xmm0 >> > movhpd a+16(%rip), %xmm0 >> > >> > again rather than movupd. >> > >> > Uros, there is probably a missing peephole for the new form - can you >> > fix this as a followup or should I hold on this patch for a bit longer? >> >> No, please proceed with the patch, I'll fix this fallout with a >> followup patch in a couple of days. > > Applied as r238238. Is the following x86 change ok then which > adjusts the vectorizer vector construction cost to sth more sensible? > I have adjusted the generic implementation in targhooks.c this way > a few weeks ago already. > > Thanks, > Richard. > > 2016-07-12 Richard Biener <rguent...@suse.de> > > * targhooks.c (default_builtin_vectorization_cost): Adjust > vec_construct cost. > * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
Looks OK to me, but let's also give Intel chance to comment. Uros. > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c (revision 237196) > +++ gcc/config/i386/i386.c (working copy) > @@ -49503,8 +49520,6 @@ static int > ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, > tree vectype, int) > { > - unsigned elements; > - > switch (type_of_cost) > { > case scalar_stmt: > @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve > return ix86_cost->vec_stmt_cost; > > case vec_construct: > - elements = TYPE_VECTOR_SUBPARTS (vectype); > - return ix86_cost->vec_stmt_cost * (elements / 2 + 1); > + return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - > 1); > > default: > gcc_unreachable ();