Ping.
On Fri, Apr 18, 2014 at 2:05 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > Hi, > > Merged with current master the patch passes bootstrap and is giving > expected gains. > Patch and new tests are attached. > > ChangeLog: > > 2014-04-18 Evgeny Stupachenko <evstu...@gmail.com> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New > check for stores group of length 3. > (vect_permute_store_chain): New permutations for stores group of > length 3. > (vect_grouped_load_supported): New check for loads group of length 3. > (vect_permute_load_chain): New permutations for loads group of length > 3. > * tree-vect-stmts.c (vect_model_store_cost): Change cost > of vec_perm_shuffle for the new permutations. > (vect_model_load_cost): Ditto. > > ChangeLog for testsuite: > > 2014-04-18 Evgeny Stupachenko <evstu...@gmail.com> > > PR tree-optimization/52252 > * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3. > * gcc.dg/vect/pr52252-st.c: Test on stores group of size 3. > > Evgeny > > On Thu, Mar 6, 2014 at 6:44 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: >> Missed attachment. >> >> On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstu...@gmail.com> >> wrote: >>> I've separated the patch into 2: cost model tuning and load/store >>> groups parallelism. >>> SLM tuning was partially introduced in the patch: >>> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html >>> The patch introducing vectorization for load/store groups of size 3 >>> attached. >>> >>> Is it ok for stage1? >>> >>> ChangeLog: >>> >>> 2014-03-06 Evgeny Stupachenko <evstu...@gmail.com> >>> >>> * tree-vect-data-refs.c (vect_grouped_store_supported): New >>> check for stores group of length 3. >>> (vect_permute_store_chain): New permutations for stores group of >>> length 3. >>> (vect_grouped_load_supported): New check for loads group of length 3. >>> (vect_permute_load_chain): New permutations for loads group of >>> length 3. >>> * tree-vect-stmts.c (vect_model_store_cost): Change cost >>> of vec_perm_shuffle for the new permutations. >>> (vect_model_load_cost): Ditto. >>> >>> >>> >>> On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguent...@suse.de> wrote: >>>> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >>>> >>>>> Missed patch attached in plain-text. >>>>> >>>>> I have copyright assignment on file with the FSF covering work on GCC. >>>>> >>>>> Load/stores groups of length 3 is the most frequent non-power-of-2 >>>>> case. It is used in RGB image processing (like test case in PR52252). >>>>> For sure we can extend the patch to length 5 and more. However, this >>>>> potentially affect performance on some other architectures and >>>>> requires larger testing. So length 3 it is just first step.The >>>>> algorithm in the patch could be modified for a general case in several >>>>> steps. >>>>> >>>>> I understand that the patch should wait for the stage 1, however since >>>>> its ready we can discuss it right now and make some changes (like >>>>> general size of group). >>>> >>>> Other than that I'd like to see a vectorizer hook querying the cost of a >>>> vec_perm_const expansion instead of adding vec_perm_shuffle >>>> (thus requires the constant shuffle mask to be passed as well >>>> as the vector type). That's more useful for other uses that >>>> would require (arbitrary) shuffles. >>>> >>>> Didn't look at the rest of the patch yet - queued in my review >>>> pipeline. >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks, >>>>> Evgeny >>>>> >>>>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguent...@suse.de> wrote: >>>>> > >>>>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >>>>> > >>>>> > > Hi, >>>>> > > >>>>> > > The patch gives an expected 3 times gain for the test case in the >>>>> > > PR52252 >>>>> > > (and even 6 times for AVX2). >>>>> > > It passes make check and bootstrap on x86. >>>>> > > spec2000/spec2006 got no regressions/gains on x86. >>>>> > > >>>>> > > Is this patch ok? >>>>> > >>>>> > I've worked on generalizing the permutation support in the light >>>>> > of the availability of the generic shuffle support in the IL >>>>> > but hit some road-blocks in the way code-generation works for >>>>> > group loads with permutations (I don't remember if I posted all >>>>> > patches). >>>>> > >>>>> > This patch seems to be to a slightly different place but it again >>>>> > special-cases a specific permutation. Why's that? Why can't we >>>>> > support groups of size 7 for example? So - can this be generalized >>>>> > to support arbitrary non-power-of-two load/store groups? >>>>> > >>>>> > Other than that the patch has to wait for stage1 to open again, >>>>> > of course. And it misses a testcase. >>>>> > >>>>> > Btw, do you have a copyright assignment on file with the FSF covering >>>>> > work on GCC? >>>>> > >>>>> > Thanks, >>>>> > Richard. >>>>> > >>>>> > > ChangeLog: >>>>> > > >>>>> > > 2014-02-11 Evgeny Stupachenko <evstu...@gmail.com> >>>>> > > >>>>> > > * target.h (vect_cost_for_stmt): Defining new cost >>>>> > > vec_perm_shuffle. >>>>> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New >>>>> > > check for stores group of length 3. >>>>> > > (vect_permute_store_chain): New permutations for stores group >>>>> > > of >>>>> > > length 3. >>>>> > > (vect_grouped_load_supported): New check for loads group of >>>>> > > length >>>>> > > 3. >>>>> > > (vect_permute_load_chain): New permutations for loads group of >>>>> > > length 3. >>>>> > > * tree-vect-stmts.c (vect_model_store_cost): New cost >>>>> > > vec_perm_shuffle >>>>> > > for the new permutations. >>>>> > > (vect_model_load_cost): Ditto. >>>>> > > * config/aarch64/aarch64.c (builtin_vectorization_cost): >>>>> > > Adding >>>>> > > vec_perm_shuffle cost as equvivalent of vec_perm cost. >>>>> > > * config/arm/arm.c: Ditto. >>>>> > > * config/rs6000/rs6000.c: Ditto. >>>>> > > * config/spu/spu.c: Ditto. >>>>> > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for >>>>> > > slow >>>>> > > byte >>>>> > > shuffle on some x86 architectures. >>>>> > > * config/i386/i386.h (processor_costs): Defining pshuffb cost. >>>>> > > * config/i386/i386.c (processor_costs): Adding pshuffb cost. >>>>> > > (ix86_builtin_vectorization_cost): Adding cost for the new >>>>> > > permutations. >>>>> > > Fixing cost for other permutations. >>>>> > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they >>>>> > > are >>>>> > > slow (TARGET_SLOW_PHUFFB). >>>>> > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. >>>>> > > Adding new shuffle cost only when byte shuffle is expected. >>>>> > > Fixing cost model for Silvermont. >>>>> > > >>>>> > > Thanks, >>>>> > > Evgeny >>>>> > > >>>>> > >>>>> > -- >>>>> > Richard Biener <rguent...@suse.de> >>>>> > SUSE / SUSE Labs >>>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer >>>>> >>>> >>>> -- >>>> Richard Biener <rguent...@suse.de> >>>> SUSE / SUSE Labs >>>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer