Re: [SVE] PR96463 - Optimise svld1rq from vectors
On Tue, 14 Dec 2021, Prathamesh Kulkarni wrote: > On Tue, 7 Dec 2021 at 19:08, Richard Sandiford > wrote: > > > > Prathamesh Kulkarni writes: > > > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford > > > wrote: > > >> > > >> Prathamesh Kulkarni writes: > > >> > Hi Richard, > > >> > I have attached a WIP untested patch for PR96463. > > >> > IIUC, the PR suggests to transform > > >> > lhs = svld1rq ({-1, -1, ...}, &v[0]) > > >> > into: > > >> > lhs = vec_perm_expr > > >> > if v is vector of 4 elements, and each element is 32 bits on little > > >> > endian target ? > > >> > > > >> > I am sorry if this sounds like a silly question, but I am not sure how > > >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > > >> > simply used NOP_EXPR (which I expected to fail), and gave type error > > >> > during gimple verification: > > >> > > >> It should be possible in principle to have a VEC_PERM_EXPR in which > > >> the operands are Advanced SIMD vectors and the result is an SVE vector. > > >> > > >> E.g., the dup in the PR would be something like this: > > >> > > >> foo (int32x4_t a) > > >> { > > >> svint32_t _2; > > >> > > >> _2 = VEC_PERM_EXPR ; > > >> return _2; > > >> } > > >> > > >> where the final operand can be built using: > > >> > > >> int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > > >> vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, > > >> 1); > > >> for (int i = 0; i < source_nelts; ++i) > > >> sel.quick_push (i); > > >> > > >> I'm not sure how well-tested that combination is though. It might need > > >> changes to target-independent code. > > > Hi Richard, > > > Thanks for the suggestions. > > > I tried the above approach in attached patch, but it still results in > > > ICE due to type mismatch: > > > > > > pr96463.c: In function ‘foo’: > > > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ > > > 8 | } > > > | ^ > > > svint32_t > > > int32x4_t > > > int32x4_t > > > svint32_t > > > _3 = VEC_PERM_EXPR ; > > > during GIMPLE pass: ccp > > > dump file: pr96463.c.032t.ccp1 > > > pr96463.c:8:1: internal compiler error: verify_gimple failed > > > > > > Should we perhaps add another tree code, that "extends" a fixed-width > > > vector into it's VLA equivalent ? > > > > No, I think this is just an extreme example of the combination not being > > well-tested. :-) Obviously it's worse than I thought. > > > > I think accepting this kind of VEC_PERM_EXPR is still the way to go. > > Richi, WDYT? > Hi Richi, ping ? We check case VEC_PERM_EXPR: if (!useless_type_conversion_p (lhs_type, rhs1_type) || !useless_type_conversion_p (lhs_type, rhs2_type)) { error ("type mismatch in %qs", code_name); and LHS is svint32_t while x_4 is int32x4_t (the permutation type can indeed be different - it needs to be integer for example). The test is indeed unnecessarily strict if there are two vector types that are not compatible but have the same element mode and the same number of elements. I guess we could check sth like if (!useless_type_conversion_p (TREE_TYPE (lhs_type), TREE_TYPE (rhs1_type)) || !types_compatible_p (rhs1_type, rhs2_type)) instead - we later check TYPE_VECTOR_SUBPARTS so they match. But note that vec_perm_optab has a single mode only so I'm not sure what mode we should pick at RTL expansion time for your quoted case so I'm a bit nervous here. Richard? Richard.
Re: [SVE] PR96463 - Optimise svld1rq from vectors
On Tue, 7 Dec 2021 at 19:08, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > Hi Richard, > >> > I have attached a WIP untested patch for PR96463. > >> > IIUC, the PR suggests to transform > >> > lhs = svld1rq ({-1, -1, ...}, &v[0]) > >> > into: > >> > lhs = vec_perm_expr > >> > if v is vector of 4 elements, and each element is 32 bits on little > >> > endian target ? > >> > > >> > I am sorry if this sounds like a silly question, but I am not sure how > >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > >> > simply used NOP_EXPR (which I expected to fail), and gave type error > >> > during gimple verification: > >> > >> It should be possible in principle to have a VEC_PERM_EXPR in which > >> the operands are Advanced SIMD vectors and the result is an SVE vector. > >> > >> E.g., the dup in the PR would be something like this: > >> > >> foo (int32x4_t a) > >> { > >> svint32_t _2; > >> > >> _2 = VEC_PERM_EXPR ; > >> return _2; > >> } > >> > >> where the final operand can be built using: > >> > >> int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > >> vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, > >> 1); > >> for (int i = 0; i < source_nelts; ++i) > >> sel.quick_push (i); > >> > >> I'm not sure how well-tested that combination is though. It might need > >> changes to target-independent code. > > Hi Richard, > > Thanks for the suggestions. > > I tried the above approach in attached patch, but it still results in > > ICE due to type mismatch: > > > > pr96463.c: In function ‘foo’: > > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ > > 8 | } > > | ^ > > svint32_t > > int32x4_t > > int32x4_t > > svint32_t > > _3 = VEC_PERM_EXPR ; > > during GIMPLE pass: ccp > > dump file: pr96463.c.032t.ccp1 > > pr96463.c:8:1: internal compiler error: verify_gimple failed > > > > Should we perhaps add another tree code, that "extends" a fixed-width > > vector into it's VLA equivalent ? > > No, I think this is just an extreme example of the combination not being > well-tested. :-) Obviously it's worse than I thought. > > I think accepting this kind of VEC_PERM_EXPR is still the way to go. > Richi, WDYT? Hi Richi, ping ? Thanks, Prathamesh > > Thanks, > Richard
Re: [SVE] PR96463 - Optimise svld1rq from vectors
Prathamesh Kulkarni writes: > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi Richard, >> > I have attached a WIP untested patch for PR96463. >> > IIUC, the PR suggests to transform >> > lhs = svld1rq ({-1, -1, ...}, &v[0]) >> > into: >> > lhs = vec_perm_expr >> > if v is vector of 4 elements, and each element is 32 bits on little >> > endian target ? >> > >> > I am sorry if this sounds like a silly question, but I am not sure how >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I >> > simply used NOP_EXPR (which I expected to fail), and gave type error >> > during gimple verification: >> >> It should be possible in principle to have a VEC_PERM_EXPR in which >> the operands are Advanced SIMD vectors and the result is an SVE vector. >> >> E.g., the dup in the PR would be something like this: >> >> foo (int32x4_t a) >> { >> svint32_t _2; >> >> _2 = VEC_PERM_EXPR ; >> return _2; >> } >> >> where the final operand can be built using: >> >> int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); >> vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); >> for (int i = 0; i < source_nelts; ++i) >> sel.quick_push (i); >> >> I'm not sure how well-tested that combination is though. It might need >> changes to target-independent code. > Hi Richard, > Thanks for the suggestions. > I tried the above approach in attached patch, but it still results in > ICE due to type mismatch: > > pr96463.c: In function ‘foo’: > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ > 8 | } > | ^ > svint32_t > int32x4_t > int32x4_t > svint32_t > _3 = VEC_PERM_EXPR ; > during GIMPLE pass: ccp > dump file: pr96463.c.032t.ccp1 > pr96463.c:8:1: internal compiler error: verify_gimple failed > > Should we perhaps add another tree code, that "extends" a fixed-width > vector into it's VLA equivalent ? No, I think this is just an extreme example of the combination not being well-tested. :-) Obviously it's worse than I thought. I think accepting this kind of VEC_PERM_EXPR is still the way to go. Richi, WDYT? Thanks, Richard
Re: [SVE] PR96463 - Optimise svld1rq from vectors
On Thu, 2 Dec 2021 at 23:11, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi Richard, > > I have attached a WIP untested patch for PR96463. > > IIUC, the PR suggests to transform > > lhs = svld1rq ({-1, -1, ...}, &v[0]) > > into: > > lhs = vec_perm_expr > > if v is vector of 4 elements, and each element is 32 bits on little > > endian target ? > > > > I am sorry if this sounds like a silly question, but I am not sure how > > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > > simply used NOP_EXPR (which I expected to fail), and gave type error > > during gimple verification: > > It should be possible in principle to have a VEC_PERM_EXPR in which > the operands are Advanced SIMD vectors and the result is an SVE vector. > > E.g., the dup in the PR would be something like this: > > foo (int32x4_t a) > { > svint32_t _2; > > _2 = VEC_PERM_EXPR ; > return _2; > } > > where the final operand can be built using: > > int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); > for (int i = 0; i < source_nelts; ++i) > sel.quick_push (i); > > I'm not sure how well-tested that combination is though. It might need > changes to target-independent code. Hi Richard, Thanks for the suggestions. I tried the above approach in attached patch, but it still results in ICE due to type mismatch: pr96463.c: In function ‘foo’: pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ 8 | } | ^ svint32_t int32x4_t int32x4_t svint32_t _3 = VEC_PERM_EXPR ; during GIMPLE pass: ccp dump file: pr96463.c.032t.ccp1 pr96463.c:8:1: internal compiler error: verify_gimple failed Should we perhaps add another tree code, that "extends" a fixed-width vector into it's VLA equivalent ? Thanks, Prathamesh > > Thanks, > Richard diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 02e42a71e5e..b38c4641535 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -44,6 +44,8 @@ #include "aarch64-sve-builtins-shapes.h" #include "aarch64-sve-builtins-base.h" #include "aarch64-sve-builtins-functions.h" +#include "print-tree.h" +#include "gimple-pretty-print.h" using namespace aarch64_sve; @@ -1207,6 +1209,57 @@ public: insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0)); return e.use_contiguous_load_insn (icode); } + + gimple * + fold (gimple_folder &f) const OVERRIDE + { +tree arg0 = gimple_call_arg (f.call, 0); +tree arg1 = gimple_call_arg (f.call, 1); + +/* Transform: + lhs = svld1rq ({-1, -1, ... }, &v[0]) + into: + tmp = vec_perm_expr. + lhs = nop_expr tmp + on little endian target. */ + +if (!BYTES_BIG_ENDIAN + && integer_all_onesp (arg0) + && TREE_CODE (arg1) == ADDR_EXPR) + { + tree t = TREE_OPERAND (arg1, 0); + if (TREE_CODE (t) == ARRAY_REF) + { + tree index = TREE_OPERAND (t, 1); + t = TREE_OPERAND (t, 0); + if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR) + { + t = TREE_OPERAND (t, 0); + tree vectype = TREE_TYPE (t); + if (VECTOR_TYPE_P (vectype) + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) + && wi::to_wide (TYPE_SIZE (vectype)) == 128) + { + tree lhs = gimple_call_lhs (f.call); + tree lhs_type = TREE_TYPE (lhs); + int source_nelts = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); + vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelts, 1); + for (int i = 0; i < source_nelts; i++) + sel.quick_push (i); + + vec_perm_indices indices (sel, 1, source_nelts); + if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices)) + return NULL; + + tree mask = vec_perm_indices_to_tree (lhs_type, indices); + return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask); + } + } + } + } + +return NULL; + } }; class svld1ro_impl : public load_replicate
Re: [SVE] PR96463 - Optimise svld1rq from vectors
Prathamesh Kulkarni writes: > Hi Richard, > I have attached a WIP untested patch for PR96463. > IIUC, the PR suggests to transform > lhs = svld1rq ({-1, -1, ...}, &v[0]) > into: > lhs = vec_perm_expr > if v is vector of 4 elements, and each element is 32 bits on little > endian target ? > > I am sorry if this sounds like a silly question, but I am not sure how > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > simply used NOP_EXPR (which I expected to fail), and gave type error > during gimple verification: It should be possible in principle to have a VEC_PERM_EXPR in which the operands are Advanced SIMD vectors and the result is an SVE vector. E.g., the dup in the PR would be something like this: foo (int32x4_t a) { svint32_t _2; _2 = VEC_PERM_EXPR ; return _2; } where the final operand can be built using: int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); for (int i = 0; i < source_nelts; ++i) sel.quick_push (i); I'm not sure how well-tested that combination is though. It might need changes to target-independent code. Thanks, Richard
[SVE] PR96463 - Optimise svld1rq from vectors
Hi Richard, I have attached a WIP untested patch for PR96463. IIUC, the PR suggests to transform lhs = svld1rq ({-1, -1, ...}, &v[0]) into: lhs = vec_perm_expr if v is vector of 4 elements, and each element is 32 bits on little endian target ? I am sorry if this sounds like a silly question, but I am not sure how to convert a vector of type int32x4_t into svint32_t ? In the patch, I simply used NOP_EXPR (which I expected to fail), and gave type error during gimple verification: svint32_t foo (int32x4_t x) { return svld1rq (svptrue_b8 (), &x[0]); } transformed to: EMERGENCY DUMP: svint32_t foo (int32x4_t x) { svint32_t _3; __Int32x4_t _4; : _4 = VEC_PERM_EXPR ; _3 = (svint32_t) _4; return _3; } and ICE's with: pr96463.c:8:1: error: invalid vector types in nop conversion 8 | } | ^ svint32_t __Int32x4_t _3 = (svint32_t) _4; during GIMPLE pass: ccp Could you please suggest how to proceed ? Thanks, Prathamesh diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 02e42a71e5e..3834f33443a 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -44,6 +44,13 @@ #include "aarch64-sve-builtins-shapes.h" #include "aarch64-sve-builtins-base.h" #include "aarch64-sve-builtins-functions.h" +#include "print-tree.h" +#include "gimple-pretty-print.h" + +/* ??? Including tree-ssanames.h requires including other header dependencies. + Just including the prototype for now. */ +extern tree make_ssa_name_fn (struct function *, tree, gimple *, + unsigned int version = 0); using namespace aarch64_sve; @@ -1207,6 +1214,52 @@ public: insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0)); return e.use_contiguous_load_insn (icode); } + + gimple * + fold (gimple_folder &f) const OVERRIDE + { +tree arg0 = gimple_call_arg (f.call, 0); +tree arg1 = gimple_call_arg (f.call, 1); + +/* Transform: + lhs = svld1rq ({-1, -1, ... }, &v[0]) + into: + tmp = vec_perm_expr. + lhs = nop_expr tmp + on little endian target. */ + +if (!BYTES_BIG_ENDIAN + && integer_all_onesp (arg0) + && TREE_CODE (arg1) == ADDR_EXPR) + { + tree t = TREE_OPERAND (arg1, 0); + if (TREE_CODE (t) == ARRAY_REF) + { + tree index = TREE_OPERAND (t, 1); + t = TREE_OPERAND (t, 0); + if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR) + { + t = TREE_OPERAND (t, 0); + tree vectype = TREE_TYPE (t); + if (VECTOR_TYPE_P (vectype) + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) + && wi::to_wide (TYPE_SIZE (vectype)) == 128) + { + tree new_temp = ::make_ssa_name_fn (cfun, vectype, NULL); + tree zero_vec = build_vector_from_val (vectype, index); + gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, t, t, zero_vec); + /* ??? How to convert between vector types if gimple_call_lhs (f.call) and + new_temp have different types ? */ + gimple *g2 = gimple_build_assign (gimple_call_lhs (f.call), NOP_EXPR, new_temp); + gsi_insert_before (f.gsi, g, GSI_SAME_STMT); + return g2; + } + } + } + } + +return NULL; + } }; class svld1ro_impl : public load_replicate