Hi Richi, > -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Friday, January 8, 2021 9:45 AM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; o...@ucw.cz > Subject: Re: [PATCH 6/8 v9]middle-end slp: support complex FMA and > complex FMA conjugate > > On Mon, 28 Dec 2020, Tamar Christina wrote: > > > Hi All, > > > > This adds support for FMA and FMA conjugated to the slp pattern matcher. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New. > > * optabs.def (cmla_optab, cmla_conj_optab): New. > > * doc/md.texi: Document them. > > * tree-vect-slp-patterns.c (vect_match_call_p, > > class complex_fma_pattern, vect_slp_reset_pattern, > > complex_fma_pattern::matches, complex_fma_pattern::recognize, > > complex_fma_pattern::build): New. > > > > --- inline copy of patch -- > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > > > b8cc90e1a75e402abbf8a8cf2efefc1a333f8b3a..6d5a98c4946d3ff4c2b8abea5c2 > 9 > > caa6863fd3f7 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -6202,6 +6202,51 @@ The operation is only supported for vector > modes @var{m}. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{cmla@var{m}4} instruction pattern @item > > +@samp{cmla@var{m}4} Perform a vector multiply and accumulate that is > > +semantically the same as a multiply and accumulate of complex > > +numbers. > > + > > +@smallexample > > + complex TYPE c[N]; > > + complex TYPE a[N]; > > + complex TYPE b[N]; > > + for (int i = 0; i < N; i += 1) > > + @{ > > + c[i] += a[i] * b[i]; > > + @} > > +@end smallexample > > + > > +In GCC lane ordering the real part of the number must be in the even > > +lanes with the imaginary part in the odd lanes. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmla_conj@var{m}4} instruction pattern @item > > +@samp{cmla_conj@var{m}4} Perform a vector multiply by conjugate and > > +accumulate that is semantically the same as a multiply and accumulate > > +of complex numbers where the second multiply arguments is conjugated. > > + > > +@smallexample > > + complex TYPE c[N]; > > + complex TYPE a[N]; > > + complex TYPE b[N]; > > + for (int i = 0; i < N; i += 1) > > + @{ > > + c[i] += a[i] * conj (b[i]); > > + @} > > +@end smallexample > > + > > +In GCC lane ordering the real part of the number must be in the even > > +lanes with the imaginary part in the odd lanes. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > @cindex @code{cmul@var{m}4} instruction pattern @item > > @samp{cmul@var{m}4} Perform a vector multiply that is semantically > > the same as multiply of diff --git a/gcc/internal-fn.def > > b/gcc/internal-fn.def index > > > 5a0bbe3fe5dee591d54130e60f6996b28164ae38..305450e026d4b94ab62ceb9c > a719 > > ec5570ff43eb 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -288,6 +288,8 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, > > binary) > > > > /* Ternary math functions. */ > > DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, > cmla_conj, > > +ternary) > > > > /* Unary integer ops. */ > > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, > unary) > > diff --git a/gcc/optabs.def b/gcc/optabs.def index > > > e82396bae1117c6de91304761a560b7fbcb69ce1..8e2758d685ed85e02df10dac > 571e > > b40d45a294ed 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -294,6 +294,8 @@ OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D > > (cadd270_optab, "cadd270$a3") OPTAB_D (cmul_optab, "cmul$a3") > > OPTAB_D (cmul_conj_optab, "cmul_conj$a3") > > +OPTAB_D (cmla_optab, "cmla$a4") > > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4") > > OPTAB_D (cos_optab, "cos$a2") > > OPTAB_D (cosh_optab, "cosh$a2") > > OPTAB_D (exp10_optab, "exp10$a2") > > diff --git a/gcc/tree-vect-slp-patterns.c > > b/gcc/tree-vect-slp-patterns.c index > > > 82721acbab8cf81c4d6f9954c98fb913a7bb6282..3625a80c08e3d70fd362fc52e1 > 7e > > 65b3b2c7da83 100644 > > --- a/gcc/tree-vect-slp-patterns.c > > +++ b/gcc/tree-vect-slp-patterns.c > > @@ -325,6 +325,24 @@ vect_match_expression_p (slp_tree node, > tree_code code) > > return true; > > } > > > > +/* Checks to see if the expression represented by NODE is a call to the > internal > > + function FN. */ > > + > > +static inline bool > > +vect_match_call_p (slp_tree node, internal_fn fn) { > > + if (!node > > + || !SLP_TREE_REPRESENTATIVE (node)) > > + return false; > > + > > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE > (node)); > > + if (!expr > > + || !gimple_call_internal_p (expr, fn)) > > + return false; > > + > > + return true; > > +} > > + > > /* Check if the given lane permute in PERMUTES matches an alternating > sequence > > of {even odd even odd ...}. This to account for unrolled loops. > > Further > > mode there resulting permute must be linear. */ > > @@ -1081,6 +1099,161 @@ complex_mul_pattern::build (vec_info *vinfo) > > complex_pattern::build (vinfo); > > } > > > > > +/********************************************************* > *********** > > +*********** > > + * complex_fma_pattern class > > + > > > +********************************************************* > ************ > > +*********/ > > + > > +class complex_fma_pattern : public complex_pattern { > > + protected: > > + complex_fma_pattern (slp_tree *node, vec<slp_tree> *m_ops, > internal_fn ifn) > > + : complex_pattern (node, m_ops, ifn) > > + { > > + this->m_num_args = 3; > > + } > > + > > + public: > > + void build (vec_info *); > > + static internal_fn > > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > > + vec<slp_tree> *); > > + > > + static vect_pattern* > > + recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > + > > + static vect_pattern* > > + mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > > + { > > + return new complex_fma_pattern (node, m_ops, ifn); > > + } > > +}; > > + > > +/* Helper function to "reset" a previously matched node and undo the > changes > > + made enough so that the node is treated as an irrelevant node. */ > > + > > +static inline void > > +vect_slp_reset_pattern (slp_tree node) { > > + stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE > > +(node)); > > + STMT_VINFO_IN_PATTERN_P (stmt_info) = false; > > + STMT_SLP_TYPE (stmt_info) = pure_slp; > > + SLP_TREE_REPRESENTATIVE (node) = stmt_info; } > > + > > +/* Pattern matcher for trying to match complex multiply and accumulate > > + and multiply and subtract patterns in SLP tree. > > + If the operation matches then IFN is set to the operation it matched and > > + the arguments to the two replacement statements are put in m_ops. > > + > > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. > > + > > + This function matches the patterns shaped as: > > + > > + double ax = (b[i+1] * a[i]) + (b[i] * a[i]); > > + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]); > > + > > + c[i] = c[i] - ax; > > + c[i+1] = c[i+1] + bx; > > + > > + If a match occurred then TRUE is returned, else FALSE. The match is > > + performed after COMPLEX_MUL which would have done the majority of > the work. > > + This function merely matches an ADD with a COMPLEX_MUL IFN. The > initial > > + match is expected to be in OP1 and the initial match operands in > > + args0. */ > > + > > +internal_fn > > +complex_fma_pattern::matches (complex_operation_t op, > > + slp_tree_to_load_perm_map_t * /* perm_cache > */, > > + slp_tree *ref_node, vec<slp_tree> *ops) { > > + internal_fn ifn = IFN_LAST; > > + > > + /* Find the two components. We match Complex MUL first which > reduces the > > + amount of work this pattern has to do. After that we just match the > > + head node and we're done.: > > + > > + * FMA: + +. > > + > > + We need to ignore the two_operands nodes that may also match. > > + For that we can check if they have any scalar statements and also > > + check that it's not a permute node as we're looking for a normal > > + PLUS_EXPR operation. */ > > + if (op != CMPLX_NONE) > > + return IFN_LAST; > > + > > + /* Find the two components. We match Complex MUL first which > reduces the > > + amount of work this pattern has to do. After that we just match the > > + head node and we're done.: > > + > > + * FMA: + + on a non-two_operands node. */ > > + slp_tree vnode = *ref_node; > > + if (SLP_TREE_LANE_PERMUTATION (vnode).exists () > > + /* Need to exclude the plus two-operands node. These are not > marked > > + so we have to infer it based on conditions. */ > > + || !SLP_TREE_SCALAR_STMTS (vnode).exists () > > as said earlier we shouldn't test this. The existing lane permute should > already cover this - where the test would better be > > SLP_TREE_CODE (vnode) == VEC_PERM_EXPR > > > + || !vect_match_expression_p (vnode, PLUS_EXPR)) > > But then it shouldn't match this (the vect_match_expression_p should only > ever match SLP_TREE_CODE (vnode) != VEC_PERM_EXPR) anyway. >
How so? An FMA doesn't have a TWO_OPERANDS node as the root since the operations Are always two PLUS operations. The corresponding tree is note: SLP size 10 vs. limit 24. note: Final SLP tree for instance 0x48f68d0: note: node 0x4809870 (max_nunits=4, refcnt=2) note: op template: REALPART_EXPR <*_3> = _31; note: stmt 0 REALPART_EXPR <*_3> = _31; note: stmt 1 IMAGPART_EXPR <*_3> = _32; note: children 0x48098f8 note: node 0x48098f8 (max_nunits=4, refcnt=2) note: op template: _31 = _12 + _29; note: stmt 0 _31 = _12 + _29; note: stmt 1 _32 = _11 + _30; note: children 0x4809980 0x4809a08 note: node 0x4809980 (max_nunits=4, refcnt=2) note: op template: _12 = REALPART_EXPR <*_3>; note: stmt 0 _12 = REALPART_EXPR <*_3>; note: stmt 1 _11 = IMAGPART_EXPR <*_3>; note: load permutation { 0 1 } note: node 0x4809a08 (max_nunits=4, refcnt=2) note: op: VEC_PERM_EXPR note: stmt 0 _29 = _25 - _26; note: stmt 1 _30 = _27 + _28; note: lane permutation { 0[0] 1[1] } note: children 0x4809dc0 0x4809e48 note: node 0x4809dc0 (max_nunits=1, refcnt=1) note: op template: _29 = _25 - _26; note: { } note: children 0x4809a90 0x4809c28 note: node 0x4809a90 (max_nunits=4, refcnt=3) note: op template: _25 = _19 * _22; note: stmt 0 _25 = _19 * _22; note: stmt 1 _27 = _20 * _22; note: children 0x4809b18 0x4809ba0 note: node 0x4809b18 (max_nunits=4, refcnt=2) note: op template: _19 = REALPART_EXPR <*_7>; note: stmt 0 _19 = REALPART_EXPR <*_7>; note: stmt 1 _20 = IMAGPART_EXPR <*_7>; note: load permutation { 0 1 } note: node 0x4809ba0 (max_nunits=4, refcnt=2) note: op template: _22 = REALPART_EXPR <*_5>; note: stmt 0 _22 = REALPART_EXPR <*_5>; note: stmt 1 _22 = REALPART_EXPR <*_5>; note: load permutation { 0 0 } note: node 0x4809c28 (max_nunits=4, refcnt=3) note: op template: _26 = _20 * _21; note: stmt 0 _26 = _20 * _21; note: stmt 1 _28 = _19 * _21; note: children 0x4809cb0 0x4809d38 note: node 0x4809cb0 (max_nunits=4, refcnt=2) note: op template: _20 = IMAGPART_EXPR <*_7>; note: stmt 0 _20 = IMAGPART_EXPR <*_7>; note: stmt 1 _19 = REALPART_EXPR <*_7>; note: load permutation { 1 0 } note: node 0x4809d38 (max_nunits=4, refcnt=2) note: op template: _21 = IMAGPART_EXPR <*_5>; note: stmt 0 _21 = IMAGPART_EXPR <*_5>; note: stmt 1 _21 = IMAGPART_EXPR <*_5>; note: load permutation { 1 1 } note: node 0x4809e48 (max_nunits=1, refcnt=1) note: op template: _30 = _27 + _28; note: { } note: children 0x4809a90 0x4809c28 and after matching the MUL all you have are the ADD node going into a COMPLEX_MUL node. > > + return IFN_LAST; > > + > > + slp_tree node = SLP_TREE_CHILDREN (vnode)[1]; > > + > > + if (vect_match_call_p (node, IFN_COMPLEX_MUL)) > > + ifn = IFN_COMPLEX_FMA; > > + else if (vect_match_call_p (node, IFN_COMPLEX_MUL_CONJ)) > > + ifn = IFN_COMPLEX_FMA_CONJ; > > + else > > + return IFN_LAST; > > + > > + if (!vect_pattern_validate_optab (ifn, vnode)) > > + return IFN_LAST; > > + > > + vect_slp_reset_pattern (node); > > I don't understand this ... it deserves a comment at least. The previous pass detecting COMPLEX_MUL would have marked the Instructions as being inside of a MUL pattern. These need to be unmarked As being part of the COMPLEX_MUL and instead be marked as COMPLEX_FMA. > Having no testcases with this patch makes it impossible for me to dig in > myself :/ Sorry, the tests would have made the file too big again.. The previous test for complex add Added gcc/testsuite/gcc.dg/vect/complex/complex-operations.c which is an overarching test Testing everything in one go. The individual tests are split off from that large test. > > Otherwise looks OK. > > Thanks, > Richard. > > > + ops->truncate (0); > > + ops->create (3); > > + > > + if (ifn == IFN_COMPLEX_FMA) > > + { > > + ops->quick_push (SLP_TREE_CHILDREN (vnode)[0]); > > + ops->quick_push (SLP_TREE_CHILDREN (node)[1]); > > + ops->quick_push (SLP_TREE_CHILDREN (node)[0]); > > + } > > + else > > + { > > + ops->quick_push (SLP_TREE_CHILDREN (vnode)[0]); > > + ops->quick_push (SLP_TREE_CHILDREN (node)[0]); > > + ops->quick_push (SLP_TREE_CHILDREN (node)[1]); > > + } > > + > > + return ifn; > > +} > > + > > +/* Attempt to recognize a complex mul pattern. */ > > + > > +vect_pattern* > > +complex_fma_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > > + slp_tree *node) > > +{ > > + auto_vec<slp_tree> ops; > > + complex_operation_t op > > + = vect_detect_pair_op (*node, true, &ops); > > + internal_fn ifn > > + = complex_fma_pattern::matches (op, perm_cache, node, &ops); > > + if (ifn == IFN_LAST) > > + return NULL; > > + > > + return new complex_fma_pattern (node, &ops, ifn); } > > + > > +/* Perform a replacement of the detected complex mul pattern with the > new > > + instruction sequences. */ > > + > > +void > > +complex_fma_pattern::build (vec_info *vinfo) { > > + SLP_TREE_CHILDREN (*this->m_node).truncate (0); > > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (this->m_ops); > > + > > + complex_pattern::build (vinfo); > > +} > > + > > > /********************************************************** > ********************* > > * Pattern matching definitions > > > > > ********************************************************** > ************ > > ********/ > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)