On December 4, 2015 4:32:33 PM GMT+01:00, Alan Lawrence <alan.lawre...@arm.com> wrote: >On 27/11/15 08:30, Richard Biener wrote: >> >> This is part 1 of a fix for PR68533 which shows that some targets >> cannot can_vec_perm_p on an identity permutation. I chose to fix >> this in the vectorizer by detecting the identity itself but with >> the current structure of vect_transform_slp_perm_load this is >> somewhat awkward. Thus the following no-op patch simplifies it >> greatly (from the times it was restricted to do interleaving-kind >> of permutes). It turned out to not be 100% no-op as we now can >> handle non-adjacent source operands so I split it out from the >> actual fix. >> >> The two adjusted testcases no longer fail to vectorize because >> of "need three vectors" but unadjusted would fail because there >> are simply not enough scalar iterations in the loop. I adjusted >> that and now we vectorize it just fine (running into PR68559 >> which I filed). >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >> >> Richard. >> >> 2015-11-27 Richard Biener <rguent...@suse.de> >> >> PR tree-optimization/68553 >> * tree-vect-slp.c (vect_get_mask_element): Remove. >> (vect_transform_slp_perm_load): Implement in a simpler way. >> >> * gcc.dg/vect/pr45752.c: Adjust. >> * gcc.dg/vect/slp-perm-4.c: Likewise. > >On aarch64 and ARM targets, this causes > >PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect >"vectorizing >stmts using SLP" 0 > >That is, we now vectorize using SLP, when previously we did not. > >On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES, >without >unrolling, but now we unroll * 4, and vectorize using 3 loads and >permutes:
Happens on x86_64 as well with at least Sse4.1. Unfortunately we'll have to start introducing much more fine-grained target-supports for vect_perm to reliably guard all targets. Richard. >../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: > >vect__31.15_94 = VEC_PERM_EXPR <vect__31.11_87, vect__31.12_89, { 0, 1, >2, 4 }>; >../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: > >vect__31.16_95 = VEC_PERM_EXPR <vect__31.12_89, vect__31.13_91, { 1, 2, >4, 5 }>; >../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: > >vect__31.17_96 = VEC_PERM_EXPR <vect__31.13_91, vect__31.14_93, { 2, 4, >5, 6 }> > >which *is* a valid vectorization strategy... > > >--Alan