Hi, Merged with current master the patch passes bootstrap and is giving expected gains. Patch and new tests are attached.
ChangeLog: 2014-04-18 Evgeny Stupachenko <evstu...@gmail.com> * tree-vect-data-refs.c (vect_grouped_store_supported): New check for stores group of length 3. (vect_permute_store_chain): New permutations for stores group of length 3. (vect_grouped_load_supported): New check for loads group of length 3. (vect_permute_load_chain): New permutations for loads group of length 3. * tree-vect-stmts.c (vect_model_store_cost): Change cost of vec_perm_shuffle for the new permutations. (vect_model_load_cost): Ditto. ChangeLog for testsuite: 2014-04-18 Evgeny Stupachenko <evstu...@gmail.com> PR tree-optimization/52252 * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3. * gcc.dg/vect/pr52252-st.c: Test on stores group of size 3. Evgeny On Thu, Mar 6, 2014 at 6:44 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > Missed attachment. > > On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: >> I've separated the patch into 2: cost model tuning and load/store >> groups parallelism. >> SLM tuning was partially introduced in the patch: >> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html >> The patch introducing vectorization for load/store groups of size 3 attached. >> >> Is it ok for stage1? >> >> ChangeLog: >> >> 2014-03-06 Evgeny Stupachenko <evstu...@gmail.com> >> >> * tree-vect-data-refs.c (vect_grouped_store_supported): New >> check for stores group of length 3. >> (vect_permute_store_chain): New permutations for stores group of >> length 3. >> (vect_grouped_load_supported): New check for loads group of length 3. >> (vect_permute_load_chain): New permutations for loads group of length >> 3. >> * tree-vect-stmts.c (vect_model_store_cost): Change cost >> of vec_perm_shuffle for the new permutations. >> (vect_model_load_cost): Ditto. >> >> >> >> On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguent...@suse.de> wrote: >>> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >>> >>>> Missed patch attached in plain-text. >>>> >>>> I have copyright assignment on file with the FSF covering work on GCC. >>>> >>>> Load/stores groups of length 3 is the most frequent non-power-of-2 >>>> case. It is used in RGB image processing (like test case in PR52252). >>>> For sure we can extend the patch to length 5 and more. However, this >>>> potentially affect performance on some other architectures and >>>> requires larger testing. So length 3 it is just first step.The >>>> algorithm in the patch could be modified for a general case in several >>>> steps. >>>> >>>> I understand that the patch should wait for the stage 1, however since >>>> its ready we can discuss it right now and make some changes (like >>>> general size of group). >>> >>> Other than that I'd like to see a vectorizer hook querying the cost of a >>> vec_perm_const expansion instead of adding vec_perm_shuffle >>> (thus requires the constant shuffle mask to be passed as well >>> as the vector type). That's more useful for other uses that >>> would require (arbitrary) shuffles. >>> >>> Didn't look at the rest of the patch yet - queued in my review >>> pipeline. >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> Evgeny >>>> >>>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguent...@suse.de> wrote: >>>> > >>>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >>>> > >>>> > > Hi, >>>> > > >>>> > > The patch gives an expected 3 times gain for the test case in the >>>> > > PR52252 >>>> > > (and even 6 times for AVX2). >>>> > > It passes make check and bootstrap on x86. >>>> > > spec2000/spec2006 got no regressions/gains on x86. >>>> > > >>>> > > Is this patch ok? >>>> > >>>> > I've worked on generalizing the permutation support in the light >>>> > of the availability of the generic shuffle support in the IL >>>> > but hit some road-blocks in the way code-generation works for >>>> > group loads with permutations (I don't remember if I posted all patches). >>>> > >>>> > This patch seems to be to a slightly different place but it again >>>> > special-cases a specific permutation. Why's that? Why can't we >>>> > support groups of size 7 for example? So - can this be generalized >>>> > to support arbitrary non-power-of-two load/store groups? >>>> > >>>> > Other than that the patch has to wait for stage1 to open again, >>>> > of course. And it misses a testcase. >>>> > >>>> > Btw, do you have a copyright assignment on file with the FSF covering >>>> > work on GCC? >>>> > >>>> > Thanks, >>>> > Richard. >>>> > >>>> > > ChangeLog: >>>> > > >>>> > > 2014-02-11 Evgeny Stupachenko <evstu...@gmail.com> >>>> > > >>>> > > * target.h (vect_cost_for_stmt): Defining new cost >>>> > > vec_perm_shuffle. >>>> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New >>>> > > check for stores group of length 3. >>>> > > (vect_permute_store_chain): New permutations for stores group >>>> > > of >>>> > > length 3. >>>> > > (vect_grouped_load_supported): New check for loads group of >>>> > > length >>>> > > 3. >>>> > > (vect_permute_load_chain): New permutations for loads group of >>>> > > length 3. >>>> > > * tree-vect-stmts.c (vect_model_store_cost): New cost >>>> > > vec_perm_shuffle >>>> > > for the new permutations. >>>> > > (vect_model_load_cost): Ditto. >>>> > > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding >>>> > > vec_perm_shuffle cost as equvivalent of vec_perm cost. >>>> > > * config/arm/arm.c: Ditto. >>>> > > * config/rs6000/rs6000.c: Ditto. >>>> > > * config/spu/spu.c: Ditto. >>>> > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for >>>> > > slow >>>> > > byte >>>> > > shuffle on some x86 architectures. >>>> > > * config/i386/i386.h (processor_costs): Defining pshuffb cost. >>>> > > * config/i386/i386.c (processor_costs): Adding pshuffb cost. >>>> > > (ix86_builtin_vectorization_cost): Adding cost for the new >>>> > > permutations. >>>> > > Fixing cost for other permutations. >>>> > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are >>>> > > slow (TARGET_SLOW_PHUFFB). >>>> > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. >>>> > > Adding new shuffle cost only when byte shuffle is expected. >>>> > > Fixing cost model for Silvermont. >>>> > > >>>> > > Thanks, >>>> > > Evgeny >>>> > > >>>> > >>>> > -- >>>> > Richard Biener <rguent...@suse.de> >>>> > SUSE / SUSE Labs >>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer >>>> >>> >>> -- >>> Richard Biener <rguent...@suse.de> >>> SUSE / SUSE Labs >>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
vect3.patch
Description: Binary data
vect3_tests.patch
Description: Binary data