Richard Biener <richard.guent...@gmail.com> writes:
> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei...@intel.com> wrote:
>>
>>         This patch implemented the optimization in PR 54346, which Merges
>>
>>         c = VEC_PERM_EXPR <a, b, VCST0>;
>>         d = VEC_PERM_EXPR <c, c, VCST1>;
>>                 to
>>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>
>>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>         is ok to removed it.
>
> Looks good, but leave Richard a chance to ask for VLA vector support which
> might be trivial to do.

Sorry for the slow reply.  It might be tricky to handle the general case,
so I'd suggest going with this for now and dealing with VLA as a follow-up.
(Probably after Prathamesh's changes to fold_vec_perm_expr.)

Thanks,
Richard

> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
> seem to handle VLA vectors.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>         PR target/54346
>>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         PR target/54346
>>         * gcc.dg/pr54346.c: New test.
>>
>> Co-authored-by: liuhongt <hongtao....@intel.com>
>> ---
>>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>  2 files changed, 54 insertions(+)
>>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 345bcb701a5..9219b0a10e1 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8086,6 +8086,47 @@ and,
>>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>
>>
>> +/* (PR54346) Merge
>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>> +   to
>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>> +
>> +(simplify
>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>> + (with
>> +  {
>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>> +      return NULL_TREE;
>> +
>> +    tree op0;
>> +    machine_mode result_mode = TYPE_MODE (type);
>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>> +    vec_perm_builder builder0;
>> +    vec_perm_builder builder1;
>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>> +
>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>> +      return NULL_TREE;
>> +
>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>> +
>> +    for (int i = 0; i < nelts; i++)
>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>> +
>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>> +
>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>> +      return NULL_TREE;
>> +
>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>> +  }
>> +  (vec_perm @1 @2 { op0; })))
>> +
>> +
>>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>     constant which when multiplied by a power of 2 contains a unique value
>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>> new file mode 100755
>> index 00000000000..d87dc3a79a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O -fdump-tree-dse1" } */
>> +
>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>> +
>> +void fun (veci a, veci b, veci *i)
>> +{
>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>> \ No newline at end of file
>> --
>> 2.18.2
>>

Reply via email to