On December 4, 2015 4:32:33 PM GMT+01:00, Alan Lawrence <alan.lawre...@arm.com> 
wrote:
>On 27/11/15 08:30, Richard Biener wrote:
>>
>> This is part 1 of a fix for PR68533 which shows that some targets
>> cannot can_vec_perm_p on an identity permutation.  I chose to fix
>> this in the vectorizer by detecting the identity itself but with
>> the current structure of vect_transform_slp_perm_load this is
>> somewhat awkward.  Thus the following no-op patch simplifies it
>> greatly (from the times it was restricted to do interleaving-kind
>> of permutes).  It turned out to not be 100% no-op as we now can
>> handle non-adjacent source operands so I split it out from the
>> actual fix.
>>
>> The two adjusted testcases no longer fail to vectorize because
>> of "need three vectors" but unadjusted would fail because there
>> are simply not enough scalar iterations in the loop.  I adjusted
>> that and now we vectorize it just fine (running into PR68559
>> which I filed).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>
>> Richard.
>>
>> 2015-11-27  Richard Biener  <rguent...@suse.de>
>>
>>      PR tree-optimization/68553
>>      * tree-vect-slp.c (vect_get_mask_element): Remove.
>>      (vect_transform_slp_perm_load): Implement in a simpler way.
>>
>>      * gcc.dg/vect/pr45752.c: Adjust.
>>      * gcc.dg/vect/slp-perm-4.c: Likewise.
>
>On aarch64 and ARM targets, this causes
>
>PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect
>"vectorizing 
>stmts using SLP" 0
>
>That is, we now vectorize using SLP, when previously we did not.
>
>On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES,
>without 
>unrolling, 
but now we unroll * 4, and vectorize using 3 loads and
>permutes:

Happens on x86_64 as well with at least Sse4.1.  Unfortunately we'll have to 
start introducing much more fine-grained target-supports for vect_perm to 
reliably guard all targets.

Richard.

>../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>
>vect__31.15_94 = VEC_PERM_EXPR <vect__31.11_87, vect__31.12_89, { 0, 1,
>2, 4 }>;
>../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>
>vect__31.16_95 = VEC_PERM_EXPR <vect__31.12_89, vect__31.13_91, { 1, 2,
>4, 5 }>;
>../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>
>vect__31.17_96 = VEC_PERM_EXPR <vect__31.13_91, vect__31.14_93, { 2, 4,
>5, 6 }>
>
>which *is* a valid vectorization strategy...
>
>
>--Alan


Reply via email to