Hi Richi, thanks for the review! Just a quick comment on one of the questions asked:
> -----Original Message----- > From: rguent...@c653.arch.suse.de <rguent...@c653.arch.suse.de> On > Behalf Of Richard Biener > Sent: Monday, November 16, 2020 3:12 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; o...@ucw.cz; > hongtao....@intel.com > Subject: Re: [PATCH 2/2] middle-end : Add support for complex pattern > detection for Add, Mul, FMA and FMS > > On Sat, 14 Nov 2020, Tamar Christina wrote: > > > Hi All, > > > > This patch series adds support for SLP vectorization of complex instructions > [1]. > > > > These instructions exist only in their vector forms and require you to > recognize > > two statements in parallel. Complex operations usually require a permute > due to > > the fact that the real and imaginary numbers are stored intermixed but > these vector > > instructions expect this and no longer need the compiler to generate a > permute. > > > > The pass is able to insert additional permutes inside the SLP tree in order > > to > > change the data flow such that the new instructions can be used. These > permutes > > are then optimized out later during optimize_slp. > > > > These instructions also support rotations along the Argand plane, as such > the operands > > have to be re-ordered to coincide with their load group. > > > > For now, this patch only adds support for: > > > > * Complex Addition with rotation of 90 and 270. > > > > Addition with rotation of the second argument around the Argand plane. > > Supported rotations are 90 and 180. > > > > c = a + (b * I) and c = a + (b * I * I * I) > > > > * Complex Multiplication and Multiplication where one operand is > conjucated. > > > > Complex multiplication and Conjucate Complex multiplication of the > second > > parameter. > > > > c = a * b and c = a * conj (b) > > > > * Complex FMA and FMA where one operand is conjucated. > > * Complex FMS and FMS where one operand is conjucated. > > > > Complex FMLA, Conjucate FMLA of the second parameter and FMLS. > > > > c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b) > > > > For the conjucate cases it supports under fast-math that the operands > that is > > being conjucated be flipped by flipping the arguments to the optab. This > > allows it to support c = conj (a) * b and c += conj (a) * b. > > > > where a, b and c are complex numbers. > > > > Complex dot-product is not currently supported in this patch set as > build_slp fails > > for it. This will be provided as a future patch. > > > > These are supported for both integer and floating point and as such these > don't look > > for real or imaginary pairs but instead rely on the early lowering of > > complex > > numbers by GCC and canonization of the operations such that it just > recognizes any > > instruction sequence matching the operations requested. > > > > To be safe when the it is not sure it can support the operation or if it > > finds > > something it does not understand it backs off. This is done by looking at > the > > order of the loads below the nodes. Anything that changes the order like a > > VEC_PERM is taken into account. The order of the loads define which > operation > > is being done. For instance a complex add with conjucate always has the > second > > loads in non-contiguous order. Because this information is queried often > > it > is > > heavily cached and because we never modify nodes in place we can cache > the > > operation through all SLP instances. > > > > When SLP failes (e.g. due to costing) we dissolve the changes and restore > the > > previous relevancies of the stmts that were in a pattern before. > > > > [1] https://developer.arm.com/documentation/ddi0487/fc/ > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? (working on moving tests to middend IFN tests instead). > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * tree-vect-slp-patterns.c: New file. > > * Makefile.in: Use it. > > * doc/passes.texi: Document it. > > * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270, > > COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA, > COMPLEX_FMA_CONJ): > > COMPLEX_FMS, COMPLEX_FMS_CONJ): New. > > * optabs.def (cadd90_optab, cadd270_optab, cmul_optab, > cmul_conj_optab, > > cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New. > > * doc/md.texi: Document them > > > > -- inline copy of patch -- > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > > index > 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe > 894318858eaa7aa 100644 > > --- a/gcc/Makefile.in > > +++ b/gcc/Makefile.in > > @@ -1646,6 +1646,7 @@ OBJS = \ > > tree-vect-loop.o \ > > tree-vect-loop-manip.o \ > > tree-vect-slp.o \ > > + tree-vect-slp-patterns.o \ > > tree-vectorizer.o \ > > tree-vector-builder.o \ > > tree-vrp.o \ > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index > 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28e > ba38700e66dc00 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -1493,7 +1493,7 @@ can be described by a series of letters for each > operand. The overall > > constraint for an operand is made from the letters for this operand > > from the first alternative, a comma, the letters for this operand from > > the second alternative, a comma, and so on until the last alternative. > > -All operands for a single instruction must have the same number of > > +All operands for a single instruction must have the same number of > > alternatives. > > @ifset INTERNALS > > Here is how it is done for fullword logical-or on the 68000: > > @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.). > @xref{Output Statement}. > > @end ifset > > @ifclear INTERNALS > > > > -So the first alternative for the 68000's logical-or could be written as > > -@code{"+m" (output) : "ir" (input)}. The second could be @code{"+r" > > -(output): "irm" (input)}. However, the fact that two memory locations > > -cannot be used in a single instruction prevents simply using @code{"+rm" > > -(output) : "irm" (input)}. Using multi-alternatives, this might be > > +So the first alternative for the 68000's logical-or could be written as > > +@code{"+m" (output) : "ir" (input)}. The second could be @code{"+r" > > +(output): "irm" (input)}. However, the fact that two memory locations > > +cannot be used in a single instruction prevents simply using @code{"+rm" > > +(output) : "irm" (input)}. Using multi-alternatives, this might be > > written as @code{"+m,r" (output) : "ir,irm" (input)}. This describes > > -all the available alternatives to the compiler, allowing it to choose > > +all the available alternatives to the compiler, allowing it to choose > > the most efficient one for the current conditions. > > > > -There is no way within the template to determine which alternative was > > -chosen. However you may be able to wrap your @code{asm} statements > with > > +There is no way within the template to determine which alternative was > > +chosen. However you may be able to wrap your @code{asm} statements > with > > builtins such as @code{__builtin_constant_p} to achieve the desired > results. > > @end ifclear > > > > @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a > > 16-bits being 0. > > > > @item L > > -Integer that is valid as an immediate operand for a > > +Integer that is valid as an immediate operand for a > > shift instruction. Range 0 to 31. > > > > @item M > > @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for > > a custom instruction opcode. Range 0 to 255. > > > > @item P > > -An immediate operand for R2 andchi/andci instructions. > > +An immediate operand for R2 andchi/andci instructions. > > > > @item S > > Matches immediates which are addresses in the small > > @@ -6132,6 +6132,97 @@ floating-point mode. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern > > +@item @samp{cadd@var{m}@var{n}3} > > +Perform a vector addition of complex numbers in operand 1 with operand > 2 > > +rotated by @var{m} degrees around the argand plane and storing the > result in > > +operand 0. The instruction must perform the operation on data that in > GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > So I'm not sure I'm happy with defining this in terms of 'complex > numbers'. At least we nowhere specify how complex numbers are laid out in > vector registers. For V4DF it could be any of {real0,imag0,real1,imag1}, > {real0,real1, imag0,imag1}, {imag0,real0,imag1,real1} or > {imag0,imag1, real0,real1}. While it's easy to give names to semantics > this way I'd prefer to define this in terms of even/odd lanes of a > vector. Maybe it can refer to the aarch64 example of complex number > instructions implementing such scheme. > > So at _least_ we have to define the layout of complex components > in the vector modes (preferably in a target independent way), say > even lanes are real parts and odd lanes are imaginary parts? > > > + > > +The operation is only supported for vector modes @var{n} and with > > +rotations @var{m} of 90 or 270. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmla@var{m}4} instruction pattern > > +@item @samp{cmla@var{m}4} > > +Perform a vector floating point multiply and accumulate of complex > numbers > > +in operand 0, operand 1 and operand 2. > > + > > +The instruction must perform the operation on data that in GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmla_conj@var{m}4} instruction pattern > > +@item @samp{cmla_conj@var{m}4} > > +Perform a vector floating point multiply and accumulate of complex > numbers > > +in operand 0, operand 1 and the conjucate of operand 2. > > + > > +The instruction must perform the operation on data that in GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmls@var{m}4} instruction pattern > > +@item @samp{cmls@var{m}4} > > +Perform a vector floating point multiply and subtract of complex numbers > > +in operand 0, operand 1 and operand 2. > > + > > +The instruction must perform the operation on data that in GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmls_conj@var{m}4} instruction pattern > > +@item @samp{cmls_conj@var{m}4} > > +Perform a vector floating point multiply and subtract of complex numbers > > +in operand 0, operand 1 and the conjucate of operand 2. > > + > > +The instruction must perform the operation on data that in GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmul@var{m}4} instruction pattern > > +@item @samp{cmul@var{m}4} > > +Perform a vector floating point multiplication of complex numbers in > operand 0 > > +and operand 1. > > + > > +The instruction must perform the operation on data that in GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > +@cindex @code{cmul_conj@var{m}4} instruction pattern > > +@item @samp{cmul_conj@var{m}4} > > +Perform a vector floating point multiplication of complex numbers in > operand 0 > > +and the conjucate of operand 1. > > + > > +The instruction must perform the operation on data that in GCC lane > > +order where the 0th lane holds the real part and the the imaginary part in > lane > > +1. > > + > > +The operation is only supported for vector modes @var{m}. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > @cindex @code{ffs@var{m}2} instruction pattern > > @item @samp{ffs@var{m}2} > > Store into operand 0 one plus the index of the least significant 1-bit > > @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a > @code{memory_barrier} pattern > > will be emitted, followed by a store of the value to the memory operand. > > > > @cindex @code{atomic_compare_and_swap@var{mode}} instruction > pattern > > -@item @samp{atomic_compare_and_swap@var{mode}} > > +@item @samp{atomic_compare_and_swap@var{mode}} > > This pattern, if defined, emits code for an atomic compare-and-swap > > operation with memory model semantics. Operand 2 is the memory on > which > > the atomic operation is performed. Operand 0 is an output operand which > > @@ -7424,7 +7515,7 @@ if the operation fails. > > > > If memory referred to in operand 2 contains the value in operand 3, then > > operand 4 is stored in memory pointed to by operand 2 and fencing based > on > > -the memory model in operand 6 is issued. > > +the memory model in operand 6 is issued. > > > > If memory referred to in operand 2 does not contain the value in operand > 3, > > then fencing based on the memory model in operand 7 is issued. > > @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap > loop will be used. > > @itemx @samp{atomic_fetch_or@var{mode}}, > @samp{atomic_fetch_and@var{mode}} > > @itemx @samp{atomic_fetch_xor@var{mode}}, > @samp{atomic_fetch_nand@var{mode}} > > These patterns emit code for an atomic operation on memory with > memory > > -model semantics, and return the original value. Operand 0 is an output > > -operand which contains the value of the memory location before the > > -operation was performed. Operand 1 is the memory on which the atomic > > +model semantics, and return the original value. Operand 0 is an output > > +operand which contains the value of the memory location before the > > +operation was performed. Operand 1 is the memory on which the atomic > > operation is performed. Operand 2 is the second operand to the binary > > operator. Operand 3 is the memory model to be used by the operation. > > > > @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target: > > (plus:HI (match_dup 0) > > (const_int -1)))] > > "" > > - > > + > > @{ > > if (which_alternative == 0) > > return "sob %0,%l1"; > > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi > > index > a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99 > a23386891a7b0c1 100644 > > --- a/gcc/doc/passes.texi > > +++ b/gcc/doc/passes.texi > > @@ -709,7 +709,8 @@ loop. > > The pass is implemented in @file{tree-vectorizer.c} (the main driver), > > @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific > parts > > and general loop utilities), @file{tree-vect-slp} (loop-aware SLP > > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. > > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and > > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher. > > Analysis of data references is in @file{tree-data-ref.c}. > > > > SLP Vectorization. This pass performs vectorization of straight-line code. > The > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index > 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652 > 144ba31c2ef702 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, > scalb, binary) > > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary) > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, > binary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, > cadd270, binary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, > cmul_conj, binary) > > + > > > > /* FP scales. */ > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > > > > /* Ternary math functions. */ > > DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, > cmla_conj, ternary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, > cmls_conj, ternary) > > > > /* Unary integer ops. */ > > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, > unary) > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index > 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a0166 > 9990bbbebd79f1 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2") > > OPTAB_D (atanh_optab, "atanh$a2") > > OPTAB_D (copysign_optab, "copysign$F$a3") > > OPTAB_D (xorsign_optab, "xorsign$F$a3") > > +OPTAB_D (cadd90_optab, "cadd90$a3") > > +OPTAB_D (cadd270_optab, "cadd270$a3") > > +OPTAB_D (cmul_optab, "cmul$a3") > > +OPTAB_D (cmul_conj_optab, "cmul_conj$a3") > > +OPTAB_D (cmla_optab, "cmla$a4") > > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4") > > +OPTAB_D (cmls_optab, "cmls$a4") > > +OPTAB_D (cmls_conj_optab, "cmls_conj$a4") > > OPTAB_D (cos_optab, "cos$a2") > > OPTAB_D (cosh_optab, "cosh$a2") > > OPTAB_D (exp10_optab, "exp10$a2") > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf11 > 0e6ac6696249fd > > --- /dev/null > > +++ b/gcc/tree-vect-slp-patterns.c > > @@ -0,0 +1,1844 @@ > > +/* SLP - Pattern matcher on SLP trees > > + Copyright (C) 2020 Free Software Foundation, Inc. > > + > > +This file is part of GCC. > > + > > +GCC is free software; you can redistribute it and/or modify it under > > +the terms of the GNU General Public License as published by the Free > > +Software Foundation; either version 3, or (at your option) any later > > +version. > > + > > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public > License > > +for more details. > > + > > +You should have received a copy of the GNU General Public License > > +along with GCC; see the file COPYING3. If not see > > +<http://www.gnu.org/licenses/>. */ > > + > > +#include "config.h" > > +#include "system.h" > > +#include "coretypes.h" > > +#include "backend.h" > > +#include "target.h" > > +#include "rtl.h" > > +#include "tree.h" > > +#include "gimple.h" > > +#include "tree-pass.h" > > +#include "ssa.h" > > +#include "optabs-tree.h" > > +#include "insn-config.h" > > +#include "recog.h" /* FIXME: for insn_data */ > > +#include "fold-const.h" > > +#include "stor-layout.h" > > +#include "gimple-iterator.h" > > +#include "cfgloop.h" > > +#include "tree-vectorizer.h" > > +#include "langhooks.h" > > +#include "gimple-walk.h" > > +#include "dbgcnt.h" > > +#include "tree-vector-builder.h" > > +#include "vec-perm-indices.h" > > +#include "gimple-fold.h" > > +#include "internal-fn.h" > > + > > +/* SLP Pattern matching mechanism. > > + > > + This extension to the SLP vectorizer allows one to transform the > generated SLP > > + tree based on any pattern. The difference between this and the normal > vect > > + pattern matcher is that unlike the former, this matcher allows you to > match > > + with instructions that do not belong to the same SSA dominator graph. > > + > > + The only requirement that this pattern matcher has is that you are only > > + only allowed to either match an entire group or none. > > + > > + The pattern matcher currently only allows you to perform replacements > to > > + internal functions. > > + > > + Once the patterns are matched it is one way, these cannot be undone. It > is > > + currently not supported to match patterns recursively. > > + > > + To add a new pattern, implement the vect_pattern class and add the > type to > > + slp_patterns. > > + > > +*/ > > + > > > +/********************************************************* > ********************** > > + * vect_pattern class > > + > ********************************************************** > ********************/ > > + > > +/* Default implementation of recognize that peforms matching, validation > and > > + replacement of nodes but that can be overriden if required. */ > > + > > +bool > > +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > > + vec_info *vinfo) > > +{ > > + if (this->matches (perm_cache)) > > + { > > + tree vectype = SLP_TREE_VECTYPE (*this->m_node); > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Found %s pattern in SLP tree\n", > > + this->get_name ()); > > + > > + if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED)) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Target supports %s vectorization with > mode %T\n", > > + internal_fn_name (this->get_ifn ()), vectype); > > + } > > + else > > + { > > + if (dump_enabled_p ()) > > + { > > + if (!vectype) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Target does not support vector type > for %T\n", > > + SLP_TREE_DEF_TYPE (*this->m_node)); > > + else > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Target does not support %s for vector type > " > > + "%T\n", internal_fn_name (this->get_ifn ()), > > + vectype); > > + } > > + return false; > > + } > > + } > > + else > > + return false; > > + > > + if (!this->validate_p (perm_cache)) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "transformation for %s not valid due to post " > > + "condition\n", > > + internal_fn_name (this->get_ifn ())); > > + return false; > > + } > > + > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, "Creating vec > patterns\n"); > > + > > + gcall* call_stmt = this->build (vinfo); > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G", > > + *this->m_node, call_stmt); > > + > > + return true; > > +} > > + > > > +/********************************************************* > ********************** > > + * General helper types > > + > ********************************************************** > ********************/ > > + > > +/* The COMPLEX_OPERATION enum denotes the possible pair of > operations that can > > + be matched when looking for expressions that we are interested > matching for > > + complex numbers addition and mla. */ > > + > > +typedef enum _complex_operation : unsigned { > > + PLUS_PLUS, > > + MINUS_PLUS, > > + PLUS_MINUS, > > + MULT_MULT, > > + CMPLX_NONE > > +} complex_operation_t; > > + > > +/* A simple pair type used for ordering of combine operations. */ > > + > > +typedef struct map_ { int a; int b; } map_t; > > + > > > +/********************************************************* > ********************** > > + * General helper functions > > + > ********************************************************** > ********************/ > > + > > +/* Helper function of linear_loads_p that checks to see if the load > permutation > > + is sequential and in monotonically increasing order of loads with no > > gaps. > > +*/ > > + > > +static inline bool > > +is_linear_load_p (load_permutation_t loads) > > +{ > > + if (loads.length() == 0) > > + return false; > > + > > + unsigned leader = loads[0]; > > + unsigned load, i; > > + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1) > > + if (load != ++leader) > > + return false; > > + return true; > > +} > > + > > + > > +/* Check to see if all loads rooted in ROOT are linear. Linearity is > > + defined as having no gaps between values loaded. */ > > + > > +static load_permutation_t > > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree > root, > > + bool *linear) > > +{ > > + *linear = false; > > + if (!root) > > + return vNULL; > > + > > + unsigned i; > > + load_permutation_t loads = vNULL; > > + load_permutation_t *tmp; > > + > > + if ((tmp = perm_cache->get (root)) != NULL) > > + { > > + *linear = is_linear_load_p (*tmp); > > + return *tmp; > > + } > > + > > + perm_cache->put (root, vNULL); > > + > > + /* If it's a load node, then just read the load permute. */ > > + if (SLP_TREE_LOAD_PERMUTATION (root).exists ()) > > + { > > + loads = SLP_TREE_LOAD_PERMUTATION (root); > > + perm_cache->put (root, loads); > > + if (!is_linear_load_p (loads)) > > + return loads; > > + } > > + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def) > > + { > > + loads.create (SLP_TREE_LANES (root)); > > + tree op; > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op) > > + { > > + gimple *defstmt = SSA_NAME_DEF_STMT (op); > > + if (!is_gimple_assign (defstmt)) > > + return vNULL; > > + > > + switch (gimple_assign_rhs_code (defstmt)) > > + { > > + case IMAGPART_EXPR: > > + loads.safe_push (1); > > + break; > > + case REALPART_EXPR: > > + loads.safe_push (0); > > + break; > > + default: > > + { > > + loads.release (); > > + return vNULL; > > + } > > + } > > + } > > + > > + perm_cache->put (root, loads); > > + if (!is_linear_load_p (loads)) > > + return loads; > > + } > > + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def) > > + return vNULL; > > + > > + > > + > > + auto_vec<load_permutation_t> all_loads; > > + bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists (); > > + > > + slp_tree child; > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) > > + { > > + /* Seed it with a value to prevent infinite recursions. */ > > + loads = linear_loads_p (perm_cache, child, linear); > > + if ((!*linear && !is_perm) || !loads.exists ()) > > + return loads; > > + > > + all_loads.safe_push (loads); > > + } > > + > > + if (is_perm) > > + { > > + lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root); > > + load_permutation_t nloads; > > + nloads.create (SLP_TREE_LANES (root)); > > + nloads.quick_grow (SLP_TREE_LANES (root)); > > + for (i = 0; i < SLP_TREE_LANES (root); i++) > > + nloads[i] = all_loads[perm[i].first][perm[i].second]; > > + > > + perm_cache->put (root, nloads); > > + if (!is_linear_load_p (nloads)) > > + return nloads; > > + loads = nloads; > > + } > > + > > + perm_cache->put (root, loads); > > + *linear = true; > > + return loads; > > +} > > + > > +/* Builds a permutation group from the operands in OPS and stores it in > BLOCKS. > > + The group describes how to combine the operators to get a valid linear > node. > > + > > + This is used when combining multiple children from a two_operators > node into > > + one using a lane permute to select the appropriate lane. As an example > the > > + permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which > > occur > > + twice in a group, e.g [0 0] only needs itself to possibly be made linear > > + whereas [1 4] means to combine the nodes 1 and 4. */ > > + > > +static void > > +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache, > > + map_t *blocks, vec<slp_tree> ops) > > +{ > > + slp_tree op; > > + unsigned i; > > + bool is_linear = false; > > + unsigned min_eq = -1, max_eq = 0; > > + unsigned min_idx = 0, max_idx = 0; > > + FOR_EACH_VEC_ELT (ops, i, op) > > + { > > + load_permutation_t perms = linear_loads_p (perm_cache, op, > &is_linear); > > + unsigned x, imin = -1, imax = 0; > > + for (x = 0; x < perms.length () && !is_linear; x++) > > + { > > + imin = MIN (imin, perms[x]); > > + imax = MAX (imax, perms[x]); > > + } > > + > > + if (imin != imax || perms.length () == 0 || is_linear) > > + blocks[i] = {i, i}; > > + else > > + { > > + if (imin <= min_eq) > > + { > > + min_eq = imin; > > + min_idx = i; > > + } > > + > > + if (imin >= max_eq) > > + { > > + max_eq = imin; > > + max_idx = i; > > + } > > + } > > + } > > + > > + /* Now fill in the gap. */ > > + blocks[min_idx] = {min_idx, max_idx}; > > + blocks[max_idx] = {min_idx, max_idx}; > > + > > + if (dump_enabled_p ()) > > + { > > + dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { "); > > + for (i = 0; i < ops.length (); i++) > > + dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b); > > + dump_printf (MSG_NOTE,"}\n"); > > + } > > + > > +} > > + > > +/* This function attempts to make a node rooted in NODE with parent > PARENT > > + linear. If the node if already linear than the node itself is returned > > + in RESULT. > > + > > + If the node is not linear then a new VEC_PERM_EXPR node is created > with a > > + lane permute that when applied will make the node linear. If such a > > + permute cannot be created then FALSE is returned from the function. > > + > > + Here linearity is defined as having a sequential, monotically increasing > > + load position inside the load permute generated by the loads reachable > from > > + NODE. */ > > + > > +static bool > > +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache, > > + slp_tree parent, slp_tree node, slp_tree *result) > > +{ > > + bool is_linear = false; > > + unsigned x, val; > > + load_permutation_t load_perm = linear_loads_p (perm_cache, node, > &is_linear); > > + if (is_linear) > > + { > > + *result = node; > > + return true; > > + } > > + > > + /* Attempt to linearise the permute. */ > > + vec<std::pair<unsigned, unsigned> > zipped; > > + zipped.create (load_perm.length ()); > > + FOR_EACH_VEC_ELT (load_perm, x, val) > > + zipped.quick_push (std::make_pair (val, x)); > > + > > + typedef const std::pair<unsigned, unsigned>* cmp_t; > > + zipped.qsort ([](const void *a, const void *b) -> int > > + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; }); > > + > > + /* Verify if we have a linear permute sequence. */ > > + if (zipped.length () > 0) > > + { > > + unsigned leader = zipped[0].first; > > + for (x = 1; x < zipped.length (); x++) > > + if(!(is_linear = (zipped[x].first == ++leader))) > > + break; > > + } > > + > > + if (!is_linear) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Loads could not be made linear %p\n", > > + node); > > + zipped.release (); > > + return false; > > + } > > + > > + for (x = 0; x < zipped.length (); x++) > > + zipped[x].first = 0; > > + > > + /* Create the new permute node and store it instead. */ > > + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); > > + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; > > + SLP_TREE_LANE_PERMUTATION (vnode) = zipped; > > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); > > + SLP_TREE_CHILDREN (vnode).quick_push (node); > > + SLP_TREE_REF_COUNT (vnode) = 1; > > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node); > > + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE > (parent); > > + *result = vnode; > > + return is_linear; > > +} > > + > > +/* Helper utility to check to see if the permutation PERM is one that can > be > > + used in a node combination operation. This is defined as the permute > not > > + having all the elements being the same. e.g [0 0]. */ > > + > > +static inline bool > > +vect_can_combine_node_p (load_permutation_t perm, bool is_linear) > > +{ > > + if (is_linear) > > + return false; > > + > > + unsigned i, x; > > + FOR_EACH_VEC_ELT (perm, i, x) > > + if (perm[0] != x) > > + return false; > > + > > + return true; > > +} > > + > > +/* This function combines the nodes in MAP together to make a new > node using a > > + lane permute. The nodes to combine are stored in ENTRIES and the > resulting > > + node is returned in RESULT. > > + > > + If the nodes are already linear then this function fails and returns > > FALSE. > > + Otherwise it returns the new node and TRUE. */ > > + > > +static bool > > +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t > *perm_cache, > > + slp_tree parent, vec<slp_tree> entries, map_t > map, > > + slp_tree *result) > > +{ > > + if (map.a == map.b) > > + return false; > > + > > + slp_tree node_a = entries[map.a]; > > + slp_tree node_b = entries[map.b]; > > + > > + bool is_a_linear = false; > > + bool is_b_linear = false; > > + > > + load_permutation_t load_perm_a > > + = linear_loads_p (perm_cache, node_a, &is_a_linear); > > + if (!vect_can_combine_node_p (load_perm_a, is_a_linear)) > > + return false; > > + > > + load_permutation_t load_perm_b > > + = linear_loads_p (perm_cache, node_b, &is_b_linear); > > + if (!vect_can_combine_node_p (load_perm_b, is_b_linear)) > > + return false; > > + > > + if (!load_perm_a.exists () || !load_perm_b.exists ()) > > + return false; > > + > > + /* Now we need to figure which node is first. */ > > + auto_vec<slp_tree> nodes; > > + nodes.create (2); > > + vec<std::pair<unsigned, unsigned> > perm; > > + perm.create (2); > > + if (load_perm_a[0] < load_perm_b[0]) > > + { > > + perm.quick_push (std::make_pair (0, 0)); > > + perm.quick_push (std::make_pair (1, 0)); > > + } > > + else > > + { > > + perm.quick_push (std::make_pair (1, 0)); > > + perm.quick_push (std::make_pair (0, 0)); > > + } > > + > > + nodes.quick_push (node_a); > > + nodes.quick_push (node_b); > > + > > + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); > > + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; > > + SLP_TREE_LANE_PERMUTATION (vnode) = perm; > > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); > > + SLP_TREE_CHILDREN (vnode).safe_splice (nodes); > > + SLP_TREE_REF_COUNT (vnode) = 1; > > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent); > > + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE > (parent); > > + *result = vnode; > > + return true; > > +} > > + > > +/* Checks to see of the expression represented by NODE is a gimple > assign with > > + code CODE. */ > > + > > +static inline bool > > +vect_match_expression_p (slp_tree node, tree_code code) > > +{ > > + if (!node > > + || !SLP_TREE_REPRESENTATIVE (node)) > > + return false; > > + > > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE > (node)); > > + if (!is_gimple_assign (expr) > > + || gimple_assign_rhs_code (expr) != code) > > + return false; > > + > > + return true; > > +} > > + > > +/* Checks to see if the expression represented by NODE is a call to the > internal > > + function FN. */ > > + > > +static inline bool > > +vect_match_call_p (slp_tree node, internal_fn fn) > > +{ > > + if (!node > > + || !SLP_TREE_REPRESENTATIVE (node)) > > + return false; > > + > > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE > (node)); > > + if (!gimple_call_internal_p (expr, fn)) > > + return false; > > + > > + return true; > > +} > > + > > +/* Checks to see if NODE is a two_operands node whom's operands are > calls to the > > + internal function FN. */ > > + > > +static bool > > +vect_match_two_call_p (slp_tree node, internal_fn fn) > > +{ > > + if (!SLP_TREE_REPRESENTATIVE (node) > > + || SLP_TREE_CODE (node) != VEC_PERM_EXPR > > + || SLP_TREE_CHILDREN (node).length () != 2) > > + return false; > > + > > + vec<slp_tree> children = SLP_TREE_CHILDREN (node); > > + > > + return vect_match_call_p (children[0], fn) > > + && vect_match_call_p (children[1], fn); > > +} > > + > > +/* Check if the given lane permute in PERMUTES matches an alternating > sequence > > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. Further mode > > + there resulting permute must be linear. */ > > + > > +static bool > > +vect_check_lane_permute (lane_permutation_t &permutes, > > + unsigned p0, unsigned p1) > > +{ > > + if (permutes.length () == 0) > > + return false; > > + > > + unsigned val[2] = {p0, p1}; > > + unsigned seed = permutes[0].second; > > + for (unsigned i = 0; i < permutes.length (); i++) > > + if (permutes[i].first != val[i % 2] > > + || permutes[i].second != seed++) > > + return false; > > + > > + return true; > > +} > > + > > +/* This function will match the two gimple expressions representing > NODE1 and > > + NODE2 in parallel and returns the pair operation that represents the two > > + expressions in the two statements. > > + > > + If match is successful then the corresponding complex_operation is > > + returned and the arguments to the two matched operations are > returned in OPS. > > + > > + If TWO_OPERANDS it is expected that the LANES of the parent > VEC_PERM select > > + from the two nodes alternatingly. > > + > > + If unsuccessful then CMPLX_NONE is returned and OPS is untouched. > > + > > + e.g. the following gimple statements > > + > > + stmt 0 _39 = _37 + _12; > > + stmt 1 _6 = _38 - _36; > > + > > + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}. > > +*/ > > + > > +static complex_operation_t > > +vect_detect_pair_op (slp_tree node1, slp_tree node2, > lane_permutation_t &lanes, > > + bool two_operands = true, vec<slp_tree> *ops = NULL) > > +{ > > + complex_operation_t result = CMPLX_NONE; > > + > > + if (vect_match_expression_p (node1, MINUS_EXPR) > > + && vect_match_expression_p (node2, PLUS_EXPR) > > + && (!two_operands || vect_check_lane_permute (lanes, 0, 1))) > > + result = MINUS_PLUS; > > + else if (vect_match_expression_p (node1, PLUS_EXPR) > > + && vect_match_expression_p (node2, MINUS_EXPR) > > + && (!two_operands || vect_check_lane_permute (lanes, 0, 1))) > > + result = PLUS_MINUS; > > + else if (vect_match_expression_p (node1, PLUS_EXPR) > > + && vect_match_expression_p (node2, PLUS_EXPR)) > > + result = PLUS_PLUS; > > + else if (vect_match_expression_p (node1, MULT_EXPR) > > + && vect_match_expression_p (node2, MULT_EXPR)) > > + result = MULT_MULT; > > + > > + if (result != CMPLX_NONE && ops != NULL) > > + { > > + ops->create (2); > > + ops->quick_push (node1); > > + ops->quick_push (node2); > > + } > > + return result; > > +} > > + > > +/* Overload of vect_detect_pair_op that matches against the > representative > > + statements in the children of NODE. It is expected that NODE has > exactly > > + two children and when TWO_OPERANDS then NODE must be a > VEC_PERM. */ > > + > > +static complex_operation_t > > +vect_detect_pair_op (slp_tree node, bool two_operands = true, > > + vec<slp_tree> *ops = NULL) > > +{ > > + if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR) > > + return CMPLX_NONE; > > + > > + if (SLP_TREE_CHILDREN (node).length () != 2) > > + return CMPLX_NONE; > > + > > + vec<slp_tree> children = SLP_TREE_CHILDREN (node); > > + lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node); > > + > > + return vect_detect_pair_op (children[0], children[1], lanes, > two_operands, > > + ops); > > +} > > + > > +/* This function marks every statement that is being replaced during the > > + the pattern matching as PURE. Normally when replacing a statement > due > > + to a pattern we add the statement to the > STMT_VINFO_PATTERN_DEF_SEQ of > > + the pattern that is replacing them. In this case however this won't > > + work as when doing the replacement we are changing the nodes that > are > > + used by the statements. This means that when vectorized the SSA chain > > + is different than in the BB. > > + > > + Declaring the statements as part of the sequence will then cause SSA > > + verification to fail as we may refer to statements that were not in the > > + original USE-DEF chain of the statement we are replacing. > > + > > + The relevancy of the statements are saved and are later restored if SLP > is to > > + be dissolved. */ > > + > > +static void > > +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree > node) > > +{ > > + if (cache->add (node)) > > + return; > > + > > + unsigned i; > > + stmt_vec_info stmt_info; > > + slp_tree child; > > + > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) > > + { > > + if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info))) > > + return; > > + > > + vect_save_relevancy (stmt_info); > > + STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true; > > + STMT_SLP_TYPE (stmt_info) = pure_slp; > > + } > > + > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > + vect_mark_stmts_as_in_pattern (cache, child); > > +} > > + > > + > > > +/********************************************************* > ********************** > > + * complex_pattern class > > + > ********************************************************** > ********************/ > > + > > +/* SLP Complex Numbers pattern matching. > > + > > + As an example, the following simple loop: > > + > > + double a[restrict N]; double b[restrict N]; double c[restrict N]; > > + > > + for (int i=0; i < N; i+=2) > > + { > > + c[i] = a[i] - b[i+1]; > > + c[i+1] = a[i+1] + b[i]; > > + } > > + > > + which represents a complex addition on with a rotation of 90* around > the > > + argand plane. i.e. if `a` and `b` were complex numbers then this would be > the > > + same as `a + (b * I)`. > > + > > + Here the expressions for `c[i]` and `c[i+1]` are independent but have to > be > > + both recognized in order for the pattern to work. As an SLP tree this is > > + represented as > > + > > + +--------------------------------+ > > + | stmt 0 *_9 = _10; | > > + | stmt 1 *_15 = _16; | > > + +--------------------------------+ > > + | > > + | > > + v > > + +--------------------------------+ > > + | stmt 0 _10 = _4 - _8; | > > + | stmt 1 _16 = _12 + _14; | > > + | lane permutation { 0[0] 1[1] } | > > + +--------------------------------+ > > + | | > > + | | > > + | | > > + +-----+ | | +-----+ > > + | | | | | | > > + +-----| { } |<-----+ +----->| { } --------+ > > + | | | +------------------| | | > > + | +-----+ | +-----+ | > > + | | | | > > + | | | | > > + | +------|------------------+ | > > + | | | | > > + v v v v > > + +--------------------------+ +--------------------------------+ > > + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; | > > + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; | > > + | load permutation { 1 0 } | | load permutation { 0 1 } | > > + +--------------------------+ +--------------------------------+ > > + > > + The pattern matcher allows you to replace both statements 0 and 1 or > none at > > + all. Because this operation is a two operands operation the actual nodes > > + being replaced are those in the { } nodes. The actual scalar statements > > + themselves are not replaced or used during the matching but instead the > > + SLP_TREE_REPRESENTATIVE statements are inspected. You are also > allowed to > > + replace and match on any number of nodes. > > + > > + Because the pattern matcher matches on the representative statement > for the > > + SLP node the case of two_operators it allows you to match the children > of the > > + node. This is done using the method `recognize ()`. > > + > > +*/ > > + > > +/* The complex_pattern class contains common code for pattern > matchers that work > > + on complex numbers. These provide functionality to allow de- > construction and > > + validation of sequences depicting/transforming REAL and IMAG pairs. */ > > + > > +class complex_pattern : public vect_pattern > > +{ > > + protected: > > + vec<slp_tree> *m_nodes = NULL; > > + complex_pattern (slp_tree *node) > > + : vect_pattern (node) > > + { } > > + > > + public: > > + bool validate_p (slp_tree_to_load_perm_map_t *); > > + gcall *build (vec_info *); > > +}; > > + > > +/* The post transform and validation function for the complex number > > + patterns. This will re-arrange the tree and re-organize the nodes such > > + that they can be used by the complex number instructions that are to be > > + created. It does this by doing the following steps: > > + > > + For each node that isn't already linear a new permute node is created > which > > + when applied makes the node linear. If such permutation does not exist > then > > + the function returns FALSE. For nodes that are already linear no work > > is > > + done. */ > > + > > +bool > > +complex_pattern::validate_p (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + slp_tree node; > > + unsigned ix; > > + auto_vec<slp_tree> workset; > > + FOR_EACH_VEC_ELT (this->m_ops, ix, node) > > + { > > + auto_vec<slp_tree> nodes; > > + nodes.create (this->m_num_args); > > + slp_tree tmp = NULL; > > + auto_vec<slp_tree> new_nodes; > > + > > + unsigned i; > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp) > > + { > > + slp_tree vnode = NULL; > > + if (vect_slp_make_linear (perm_cache, node, tmp, &vnode)) > > + { > > + nodes.quick_push (vnode); > > + if (vnode != tmp) > > + new_nodes.safe_push (vnode); > > + } > > + else > > + { > > + FOR_EACH_VEC_ELT (new_nodes, i, tmp) > > + delete tmp; > > + > > + FOR_EACH_VEC_ELT (workset, i, tmp) > > + delete tmp; > > + > > + nodes.release(); > > + return false; > > + } > > + } > > + > > + slp_tree new_node > > + = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node), > > + SLP_TREE_CHILDREN (node).length ()); > > + SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node); > > + SLP_TREE_LANE_PERMUTATION (new_node) = > SLP_TREE_LANE_PERMUTATION (node); > > + SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node); > > + SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node); > > + SLP_TREE_REPRESENTATIVE (new_node) = > SLP_TREE_REPRESENTATIVE (node); > > + SLP_TREE_CHILDREN (new_node).safe_splice (nodes); > > + SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node); > > + > > + workset.safe_push (new_node); > > + } > > + > > + FOR_EACH_VEC_ELT (workset, ix, node) > > + SLP_TREE_CHILDREN (*this->m_node)[ix] = node; > > + > > + return true; > > +} > > + > > + > > +/* Create a replacement pattern statement for each node in m_node and > inserts > > + the new statement into m_node as the new representative statement. > The old > > + statement is marked as being in a pattern defined by the new statement. > The > > + statement is created as call to internal function IFN with m_num_args > > + arguments. > > + > > + Futhermore the new pattern is also added to the vectorization > information > > + structure VINFO and the old statement STMT_INFO is marked as unused > while > > + the new statement is marked as used and the number of SLP uses of the > new > > + statement is incremented. > > + > > + The newly created SLP nodes are marked as SLP only and will be > dissolved > > + if SLP is aborted. > > + > > + The newly created gimple call is returned and the BB remains unchanged. > > + > > + This default method is designed to only match against simple operands > where > > + all the input and output types are the same. > > +*/ > > + > > +gcall * > > +complex_pattern::build (vec_info *vinfo) > > +{ > > + stmt_vec_info stmt_info; > > + > > + auto_vec<tree> args; > > + args.create (this->m_num_args); > > + args.quick_grow_cleared (this->m_num_args); > > + slp_tree node; > > + unsigned ix; > > + stmt_vec_info call_stmt_info; > > + gcall *call_stmt = NULL; > > + > > + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node) > > So elsewhere I see m_nodes assigned from SLP_TREE_CHILDREN > of m_node. I assume m_node is the VEC_PERM of the two-operator > operation. I expected we create a call for m_node here, not > for all children? It depends on the pattern. For most, particularly those of two_operands nodes, unless I have mistaken something it's the children of the node that determines what operation is being performed isn't it? So for ADD for instance, the tree is node 0x49aeae0 (max_nunits=4, refcnt=2) stmt 0 REALPART_EXPR <*_10> = _4; stmt 1 IMAGPART_EXPR <*_10> = _22; children 0x49b0b20 node 0x49b0b20 (max_nunits=4, refcnt=2) stmt 0 _4 = _13 - _6; stmt 1 _22 = _8 + _12; lane permutation { 0[0] 1[1] } children 0x49b0430 0x49a73c0 node 0x49b0430 (max_nunits=1, refcnt=1) { } children 0x49ad660 0x499e400 node 0x49ad660 (max_nunits=4, refcnt=3) stmt 0 _13 = REALPART_EXPR <*_3>; stmt 1 _12 = IMAGPART_EXPR <*_3>; load permutation { 0 1 } node 0x499e400 (max_nunits=4, refcnt=3) stmt 0 _6 = IMAGPART_EXPR <*_5>; stmt 1 _8 = REALPART_EXPR <*_5>; load permutation { 1 0 } node 0x49a73c0 (max_nunits=1, refcnt=1) { } children 0x49ad660 0x499e400 where the VEC_PERM node itself just contains the permute and the children of it 0x49b0430 and 0x49a73c0 contain the operations - and + and from what I understood the lane permute defined how the input is deinterleaved between the two nodes. So when replacing them it's those two that I need to change and not VEC_PERM itself otherwise the node could contain a permute and an operation that needs to be vectorized? Or did you mean I should dissolve the two_operand node completely at this point? I was trying to minimize the number of changes done on each replacement, because then it means I can assume that any information I have already calculated of the children of the node being replaced is still the same when doing recursive matching. The tree directly out of the matcher is node 0x49aeae0 (max_nunits=4, refcnt=2) stmt 0 REALPART_EXPR <*_10> = _4; stmt 1 IMAGPART_EXPR <*_10> = _22; children 0x49b0b20 node 0x49b0b20 (max_nunits=4, refcnt=2) stmt 0 _4 = _13 - _6; stmt 1 _22 = _8 + _12; lane permutation { 0[0] 1[1] } children 0x4ade0c0 0x4ade220 node 0x4ade0c0 (max_nunits=1, refcnt=1) { } SLP Only pattern: slp_patt_41 = .COMPLEX_ADD_ROT90 (_pa_43, _pa_42); children 0x49ad660 0x49b60f0 node 0x49ad660 (max_nunits=4, refcnt=3) stmt 0 _13 = REALPART_EXPR <*_3>; stmt 1 _12 = IMAGPART_EXPR <*_3>; load permutation { 0 1 } node 0x49b60f0 (max_nunits=1, refcnt=1) { } lane permutation { 0[1] 0[0] } children 0x499e400 node 0x499e400 (max_nunits=4, refcnt=3) stmt 0 _6 = IMAGPART_EXPR <*_5>; stmt 1 _8 = REALPART_EXPR <*_5>; load permutation { 1 0 } node 0x4ade220 (max_nunits=1, refcnt=1) { } SLP Only pattern: slp_patt_32 = .COMPLEX_ADD_ROT90 (_pa_40, _pa_33); children 0x49ad660 0x4ade180 node 0x4ade180 (max_nunits=1, refcnt=1) { } lane permutation { 0[1] 0[0] } children 0x499e400 The one exception to this is when the node being replaced isn't a two_operands node, like is the case with FMA. In that case the node is just an ADD node which itself is replaced. This is why it's a variable, so I can tell it where to perform the replacement. Thanks, Tamar > > > + { > > + /* Calculate the location of the statement in NODE to replace. */ > > + stmt_info = SLP_TREE_REPRESENTATIVE (node); > > + gimple* old_stmt = STMT_VINFO_STMT (stmt_info); > > + tree type = TREE_TYPE (gimple_get_lhs (old_stmt)); > > + > > + /* Create the argument set for use by gimple_build_call_internal_vec. > */ > > + for (unsigned i = 0; i < this->m_num_args; i++) > > + { > > + tree var = make_temp_ssa_name (type, old_stmt, "_pa"); > > Why do you need to create N new SSA names (all with the same > definition statement)? It looks like this new "scalar" stmt > is only a template stored in SLP_TREE_REPRESENTATIVE > and supposedly there needs to be "something" in the actual > arguments it make us not crash later? Can't you for example > simply push the old_stmt LHS N times? > > For VEC_PERM I originally wanted to set SLP_TREE_REPRESENTATIVE > to NULL_TREE but we don't like that obviously. So long-term > we might want to change SLP_TREE_CODE to a match_code to allow > tree codes and internal-functions and "ignore" the representative > in case it is not ERROR_MARK. > > > + args[i] = var; > > + } > > + > > + /* Create the new pattern statements. */ > > + call_stmt = gimple_build_call_internal_vec (this->m_ifn, args); > > + tree var = make_temp_ssa_name (type, call_stmt, "slp_patt"); > > + gimple_call_set_lhs (call_stmt, var); > > + gimple_set_location (call_stmt, gimple_location (old_stmt)); > > + gimple_call_set_nothrow (call_stmt, true); > > + > > + /* Adjust the book-keeping for the new and old statements for use > during > > + SLP. This is required to get the right VF and statement during SLP > > + analysis. These changes are created after relevancy has been set for > > + the nodes as such we need to manually update them. Any changes > will be > > + undone if SLP is cancelled. */ > > + call_stmt_info > > + = vinfo->add_pattern_stmt (call_stmt, stmt_info); > > + > > + /* add_pattern_stmt can't be done in vect_mark_pattern_stmts > because > > + the non-SLP pattern matchers already have added the statement to > VINFO > > + by the time it is called. Some of them need to modify the returned > > + stmt_info. vect_mark_pattern_stmts is called by recog_pattern and > it > > + would increase the size of each pattern with boilerplate code to > make > > + the call there. */ > > + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt, > > + SLP_TREE_VECTYPE (node)); > > + STMT_SLP_TYPE (call_stmt_info) = pure_slp; > > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true; > > + > > + /* Since we are replacing all the statements in the group with the > > same > > + thing it doesn't really matter. So just set it every time a new stmt > > + is created. */ > > + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info; > > + SLP_TREE_CODE (node) = CALL_EXPR; > > + } > > + return call_stmt; > > +} > > + > > > +/********************************************************* > ********************** > > + * complex_add_pattern class > > + > ********************************************************** > ********************/ > > + > > +class complex_add_pattern : public complex_pattern > > +{ > > + protected: > > + complex_add_pattern (slp_tree *node) > > + : complex_pattern (node) > > + { > > + this->m_num_args = 2; > > + } > > + > > + public: > > + static vect_pattern* create (slp_tree *node) > > + { > > + return new complex_add_pattern (node); > > + } > > + > > + const char* get_name () > > + { > > + return "Complex Addition"; > > + } > > + > > + bool matches (slp_tree_to_load_perm_map_t *); > > + bool matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *, > > + vec<slp_tree> ops); > > +}; > > + > > +/* Pattern matcher for trying to match complex addition pattern in SLP > tree. > > + > > + If no match is found then IFN is set to IFN_LAST. > > + This function matches the patterns shaped as: > > + > > + c[i] = a[i] - b[i+1]; > > + c[i+1] = a[i+1] + b[i]; > > + > > + If a match occurred then TRUE is returned, else FALSE. The initial > > match > is > > + expected to be in OP1 and the initial match operands in args0. */ > > + > > +bool > > +complex_add_pattern::matches (complex_operation_t op, > > + slp_tree_to_load_perm_map_t *perm_cache, > > + vec<slp_tree> ops) > > +{ > > + this->m_ifn = IFN_LAST; > > + this->m_ops.safe_splice (ops); > > + > > + /* Find the two components. Rotation in the complex plane will modify > > + the operations: > > + > > + * Rotation 0: + + > > + * Rotation 90: - + > > + * Rotation 180: - - > > + * Rotation 270: + - > > + > > + Rotation 0 and 180 can be handled by normal SIMD code, so we don't > need > > + to care about them here. */ > > + if (op == MINUS_PLUS) > > + this->m_ifn = IFN_COMPLEX_ADD_ROT90; > > + else if (op == PLUS_MINUS) > > + this->m_ifn = IFN_COMPLEX_ADD_ROT270; > > + > > + /* verify that there is a permute, otherwise this isn't a pattern we > > + we support. */ > > + bool is_linear = false; > > + bool all_linear = true; > > + unsigned x; > > + for (x = 0; x < this->m_ops.length () && all_linear; x++) > > + { > > + linear_loads_p (perm_cache, this->m_ops[x], &is_linear); > > + all_linear = all_linear && is_linear; > > + } > > + > > + if (this->m_ifn == IFN_LAST || all_linear) > > + return false; > > + > > + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node); > > + return true; > > +} > > + > > +bool > > +complex_add_pattern::matches (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + complex_operation_t op > > + = vect_detect_pair_op (*this->m_node, true, &this->m_ops); > > + return matches (op, perm_cache, this->m_ops); > > +} > > + > > > +/********************************************************* > ********************** > > + * complex_mul_pattern > > + > ********************************************************** > ********************/ > > + > > +/* Helper function of that looks for a match in the CHILDth child of NODE. > The > > + child used is stored in RES. > > + > > + If the match is successful then ARGS will contain the operands matched > > + and the complex_operation_t type is returned. If match is not > successful > > + then CMPLX_NONE is returned and ARGS is left unmodified. */ > > + > > +static inline complex_operation_t > > +vect_match_call_complex_mla (slp_tree node, unsigned child, > > + vec<slp_tree> *args = NULL, slp_tree *res = NULL) > > +{ > > + gcc_assert (child < SLP_TREE_CHILDREN (node).length ()); > > + > > + slp_tree data = SLP_TREE_CHILDREN (node)[child]; > > + > > + if (res) > > + *res = data; > > + > > + return vect_detect_pair_op (data, false, args); > > +} > > + > > +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the > first > > + child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. > > + > > + If a negate is found then the values in ARGS are reordered such that the > > + negate node is always the second one and the entry is replaced by the > child > > + of the negate node. */ > > + > > +static inline bool > > +vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL) > > +{ > > + gcc_assert (args.length () == 2); > > + bool neg_found = false; > > + > > + if (vect_match_expression_p (args[0], NEGATE_EXPR)) > > + { > > + std::swap (args[0], args[1]); > > + neg_found = true; > > + if (neg_first_p) > > + *neg_first_p = true; > > + } > > + else if (vect_match_expression_p (args[1], NEGATE_EXPR)) > > + { > > + neg_found = true; > > + if (neg_first_p) > > + *neg_first_p = false; > > + } > > + > > + if (neg_found) > > + args[1] = SLP_TREE_CHILDREN (args[1])[0]; > > + > > + return neg_found; > > +} > > + > > +/* Helper function that checks to see if the load permutation PERM is on > that > > + belongs to a duplication operation. i.e. checks to see if all the > > values in > > + PERM are the same. */ > > + > > +static inline bool > > +vect_is_dup (load_permutation_t perm, unsigned idx) > > +{ > > + for (unsigned i = 0; i < perm.length (); i++) > > + if (perm[i] != idx) > > + return false; > > + > > + return true; > > +} > > + > > +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both > MULT_EXPR > > + nodes but also that they represent an operation that is either a complex > > + multiplication or a complex multiplication by conjucated value. > > + > > + Of the negation is expected to be in the first half of the tree (As > required > > + by an FMS pattern) then NEG_FIRST is true. If the operation is a > conjucate > > + operation then CONJ_FIRST_OPERAND is set to indicate whether the > first or > > + second operand contains the conjucate operation. */ > > + > > +static inline bool > > +vect_validate_multiplication (slp_tree_to_load_perm_map_t > *perm_cache, > > + vec<slp_tree> left_op, vec<slp_tree> right_op, > > + bool neg_first, bool *conj_first_operand) > > +{ > > + /* The presence of a negation indicates that we have either a conjucate > or a > > + rotation. We need to distinguish which one. */ > > + *conj_first_operand = false; > > + > > + load_permutation_t perm; > > + bool is_linear; > > + /* Complex conjucates have the negation on the imaginary part of the > > + number where rotations affect the real component. So check if the > > + negation is on a dup of lane 1. */ > > + perm = linear_loads_p (perm_cache, right_op[1], &is_linear); > > + if (is_linear || !vect_is_dup (perm, 1)) > > + return false; > > + > > + /* Check if the conjucate is on the second first or second operand. The > > + order of the node with the conjucate value determines this, and the > dup > > + node must be one of lane 0 of the same DR as the neg node. */ > > + perm = linear_loads_p (perm_cache, left_op[0], &is_linear); > > + if (is_linear) > > + { > > + perm = linear_loads_p (perm_cache, left_op[1], &is_linear); > > + if (is_linear) > > + return false; > > + } > > + else if (!neg_first) > > + *conj_first_operand = true; > > + else > > + return false; > > + > > + if (!vect_is_dup (perm, 0)) > > + return false; > > + > > + return true; > > +} > > + > > +/* Helper function to help distinguish between a conjucate and a rotation > in a > > + complex multiplication. The operations have similar shapes but the > order of > > + the load permutes are different. This function returns TRUE when the > order > > + is consistent with a multiplication or multiplication by conjucated > > + operand but returns FALSE if it's a multiplication by rotated operand. > > */ > > + > > +static inline bool > > +vect_validate_multiplication (slp_tree_to_load_perm_map_t > *perm_cache, > > + vec<slp_tree> op, unsigned idx) > > +{ > > + /* The left node is the more common case, test it first. */ > > + bool is_linear; > > + load_permutation_t perm = linear_loads_p (perm_cache, op[0], > &is_linear); > > + if (is_linear || !vect_is_dup (perm, idx)) > > + { > > + perm = linear_loads_p (perm_cache, op[1], &is_linear); > > + if (is_linear || !vect_is_dup (perm, idx)) > > + return false; > > + } > > + > > + return true; > > +} > > + > > +class complex_mul_pattern : public complex_pattern > > +{ > > + protected: > > + /* Allocate enough space for FMA as well. */ > > + map_t m_blocks[6] = {}; > > + bool m_inplace = false; > > + complex_mul_pattern (slp_tree *node) > > + : complex_pattern (node) > > + { > > + this->m_num_args = 2; > > + } > > + > > + public: > > + static vect_pattern* create (slp_tree *node) > > + { > > + return new complex_mul_pattern (node); > > + } > > + > > + const char* get_name () > > + { > > + return "Complex Multiplication"; > > + } > > + > > + bool validate_p (slp_tree_to_load_perm_map_t *); > > + bool matches (slp_tree_to_load_perm_map_t *); > > + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *, > > + vec<slp_tree>); > > +}; > > + > > +/* Pattern matcher for trying to match complex multiply pattern in SLP > tree > > + If the operation matches then IFN is set to the operation it matched > > + and the arguments to the two replacement statements are put in > m_ops. > > + > > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. > > + > > + This function matches the patterns shaped as: > > + > > + double ax = (b[i+1] * a[i]); > > + double bx = (a[i+1] * b[i]); > > + > > + c[i] = c[i] - ax; > > + c[i+1] = c[i+1] + bx; > > + > > + If a match occurred then TRUE is returned, else FALSE. The initial > > match > is > > + expected to be in OP1 and the initial match operands in args0. */ > > + > > +bool > > +complex_mul_pattern::matches (complex_operation_t op, > > + slp_tree_to_load_perm_map_t *perm_cache, > > + vec<slp_tree> /* ops */) > > +{ > > + this->m_ifn = IFN_LAST; > > + > > + if (op != MINUS_PLUS) > > + return false; > > + > > + slp_tree root = *this->m_node; > > + /* First two nodes must be a multiply. */ > > + auto_vec<slp_tree> muls; > > + if (vect_match_call_complex_mla (root, 0) != MULT_MULT > > + || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT) > > + return false; > > + > > + /* Now operand2+4 may lead to another expression. */ > > + auto_vec<slp_tree> left_op, right_op; > > + left_op.safe_splice (SLP_TREE_CHILDREN (muls[0])); > > + right_op.safe_splice (SLP_TREE_CHILDREN (muls[1])); > > + > > + bool neg_first; > > + bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); > > + > > + if (!is_neg) > > + { > > + /* A multiplication needs to multiply agains the real pair, otherwise > > + the pattern matches that of FMS. */ > > + if (!vect_validate_multiplication (perm_cache, left_op, 0)) > > + return false; > > + this->m_ifn = IFN_COMPLEX_MUL; > > + this->m_ops.safe_splice (left_op); > > + this->m_ops.safe_splice (right_op); > > + } > > + else if (is_neg) > > + { > > + bool conj_first_operand; > > + if (!vect_validate_multiplication (perm_cache, left_op, right_op, > > + neg_first, &conj_first_operand)) > > + return false; > > + > > + this->m_ifn = IFN_COMPLEX_MUL_CONJ; > > + /* Check if the conjucate is on the first or second parameter. */ > > + if (!conj_first_operand) > > + { > > + this->m_ops.safe_push (left_op[1]); > > + this->m_ops.safe_push (left_op[0]); > > + } > > + else > > + { > > + this->m_ops.safe_push (left_op[0]); > > + this->m_ops.safe_push (left_op[1]); > > + } > > + this->m_ops.safe_push (right_op[1]); > > + this->m_ops.safe_push (right_op[0]); > > + } > > + > > + vect_build_perm_groups (perm_cache, &this->m_blocks[0], this- > >m_ops); > > + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node); > > + > > + return true; > > +} > > + > > +bool > > +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + complex_operation_t op > > + = vect_detect_pair_op (*this->m_node); > > + return matches (op, perm_cache, this->m_ops); > > +} > > + > > + > > +/* Validates to see if the Complex MUL that we have matched is valid. > This is > > + done through a combination of making nodes linear and combining > nodes. */ > > + > > +bool > > +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + slp_tree node; > > + unsigned ix; > > + hash_set<slp_tree> cache; > > + auto_vec<slp_tree> workset; > > + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node) > > + { > > + auto_vec<slp_tree> nodes; > > + nodes.create (this->m_num_args); > > + slp_tree tmp = NULL; > > + auto_vec<slp_tree> new_nodes; > > + > > + unsigned i; > > + for (i = 0; i < this->m_num_args; i++) > > + { > > + unsigned index = (ix * this->m_num_args) + i; > > + map_t map = this->m_blocks[index]; > > + slp_tree vnode = NULL; > > + bool needs_linear = map.a == map.b; > > + tmp = this->m_ops[index]; > > + cache.add (tmp); > > + if (needs_linear > > + && vect_slp_make_linear (perm_cache, node, tmp, &vnode)) > > + { > > + nodes.quick_push (vnode); > > + if (vnode != tmp) > > + new_nodes.safe_push (vnode); > > + } > > + else if (!needs_linear > > + && vect_slp_make_combine_linear (perm_cache, node, > > + this->m_ops, map, > &vnode)) > > + { > > + nodes.quick_push (vnode); > > + if (vnode != tmp) > > + new_nodes.safe_push (vnode); > > + } > > + else > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "stmts could not be made %s %p\n", > > + needs_linear ? "linear" : "linear/combined", > > + tmp); > > + FOR_EACH_VEC_ELT (new_nodes, i, tmp) > > + delete tmp; > > + > > + if (!m_inplace) > > + FOR_EACH_VEC_ELT (workset, i, tmp) > > + delete tmp; > > + > > + nodes.release(); > > + return false; > > + } > > + > > + vect_mark_stmts_as_in_pattern (&cache, node); > > + } > > + > > + if (m_inplace) > > + { > > + workset.safe_splice (nodes); > > + } > > + else > > + { > > + slp_tree new_node > > + = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node), > > + SLP_TREE_CHILDREN (node).length > ()); > > + SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node); > > + SLP_TREE_LANE_PERMUTATION (new_node) > > + = SLP_TREE_LANE_PERMUTATION (node); > > + SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node); > > + SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT > (node); > > + SLP_TREE_REPRESENTATIVE (new_node) = > SLP_TREE_REPRESENTATIVE (node); > > + SLP_TREE_CHILDREN (new_node).safe_splice (nodes); > > + SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node); > > + > > + workset.safe_push (new_node); > > + } > > + } > > + > > + if (m_inplace) > > + { > > + SLP_TREE_CHILDREN (*this->m_node).truncate (0); > > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset); > > + } > > + else > > + { > > + FOR_EACH_VEC_ELT (workset, ix, node) > > + SLP_TREE_CHILDREN (*this->m_node)[ix] = node; > > + } > > + > > + return true; > > +} > > + > > > +/********************************************************* > ********************** > > + * complex_fma_pattern class > > + > ********************************************************** > ********************/ > > + > > +class complex_fma_pattern : public complex_mul_pattern > > +{ > > + protected: > > + /* Temporary nodes cache as scope of the class so we don't > > + have to worry about freeing it. */ > > + auto_vec<slp_tree> m_ncache; > > + /* List of nodes to mark as no longer being a mul. The first > > + one is chosen and the last one is dangling. */ > > + auto_vec<slp_tree> m_mul_nodes; > > + complex_fma_pattern (slp_tree *node) > > + : complex_mul_pattern (node) > > + { > > + this->m_num_args = 3; > > + } > > + > > + public: > > + static vect_pattern* create (slp_tree *node) > > + { > > + return new complex_fma_pattern (node); > > + } > > + > > + const char* get_name () > > + { > > + return "Complex FMA"; > > + } > > + > > + bool matches (slp_tree_to_load_perm_map_t *); > > + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *, > > + vec<slp_tree> ops); > > + bool validate_p (slp_tree_to_load_perm_map_t *); > > +}; > > + > > +/* Helper function to "reset" a previously matched node and undo the > changes > > + made enough so that the node is treated as an irrelevant node. */ > > + > > +static inline void > > +vect_slp_reset_pattern (slp_tree node) > > +{ > > + stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE > (node)); > > + STMT_VINFO_IN_PATTERN_P (stmt_info) = false; > > + STMT_SLP_TYPE (stmt_info) = pure_slp; > > + SLP_TREE_REPRESENTATIVE (node) = stmt_info; > > +} > > + > > +/* Pattern matcher for trying to match complex multiply and accumulate > > + and multiply and subtract patterns in SLP tree. > > + If the operation matches then IFN is set to the operation it matched and > > + the arguments to the two replacement statements are put in m_ops. > > + > > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. > > + > > + This function matches the patterns shaped as: > > + > > + double ax = (b[i+1] * a[i]) + (b[i] * a[i]); > > + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]); > > + > > + c[i] = c[i] - ax; > > + c[i+1] = c[i+1] + bx; > > + > > + If a match occurred then TRUE is returned, else FALSE. The match is > > + performed after COMPLEX_MUL which would have done the majority of > the work. > > + This function merely matches an ADD with a COMPLEX_MUL IFN. The > initial > > + match is expected to be in OP1 and the initial match operands in args0. > */ > > + > > +bool > > +complex_fma_pattern::matches (complex_operation_t op1, > > + slp_tree_to_load_perm_map_t * /* perm_cache > */, > > + vec<slp_tree> /* args0 */) > > +{ > > + this->m_ifn = IFN_LAST; > > + > > + /* Find the two components. We match Complex MUL first which > reduces the > > + amount of work this pattern has to do. After that we just match the > > + head node and we're done.: > > + > > + * FMA: + +. > > + > > + We need to ignore the two_operands nodes that may also match. > > + For that we can check if they have any scalar statements and also > > + check that it's not a permute node as we're looking for a normal > > + PLUS_EXPR operation. */ > > + if (op1 != CMPLX_NONE) > > + return false; > > + > > + /* Find the two components. We match Complex MUL first which > reduces the > > + amount of work this pattern has to do. After that we just match the > > + head node and we're done.: > > + > > + * FMA: + + on a non-two_operands node. */ > > + if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists () > > + || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists () > > + || !vect_match_expression_p (*this->m_node, PLUS_EXPR)) > > + return false; > > + > > + slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1]; > > + > > + if (vect_match_two_call_p (node, IFN_COMPLEX_MUL)) > > + this->m_ifn = IFN_COMPLEX_FMA; > > + else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ)) > > + this->m_ifn = IFN_COMPLEX_FMA_CONJ; > > + else > > + return false; > > + > > + this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node)); > > + node = SLP_TREE_CHILDREN (node)[0]; > > + this->m_ops.create (this->m_num_args); > > + > > + if (this->m_ifn == IFN_COMPLEX_FMA) > > + { > > + this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]); > > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]); > > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]); > > + } > > + else > > + { > > + this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]); > > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]); > > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]); > > + } > > + > > + this->m_ncache.safe_push (*this->m_node); > > + this->m_nodes = &this->m_ncache; > > + > > + return true; > > +} > > + > > +bool > > +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + auto_vec<slp_tree> args0; > > + complex_operation_t op > > + = vect_detect_pair_op (*this->m_node, true, &args0); > > + return matches (op, perm_cache, args0); > > +} > > + > > +bool > > +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + /* FMA's top level node is + + and so is not two_operator, > > + we only need one child. */ > > + slp_tree node; > > + auto_vec<slp_tree> nodes; > > + auto_vec<slp_tree> new_nodes; > > + nodes.create (this->m_num_args); > > + hash_set<slp_tree> cache; > > + unsigned ix; > > + FOR_EACH_VEC_ELT (this->m_ops, ix, node) > > + { > > + slp_tree vnode = NULL; > > + if (vect_slp_make_linear (perm_cache, *this->m_node, node, > &vnode)) > > + { > > + nodes.quick_push (vnode); > > + if (vnode != node) > > + new_nodes.safe_push (vnode); > > + if (SLP_TREE_CHILDREN (node).length () == 2) > > + { > > + cache.add(SLP_TREE_CHILDREN (node)[0]); > > + cache.add(SLP_TREE_CHILDREN (node)[1]); > > + } > > + else > > + cache.add (node); > > + } > > + else > > + { > > + FOR_EACH_VEC_ELT (new_nodes, ix, node) > > + delete node; > > + > > + nodes.release(); > > + return false; > > + } > > + } > > + > > + SLP_TREE_CHILDREN (*this->m_node).truncate (0); > > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes); > > + > > + /* Ignore the part of the tree that becomes irrelevant for FMA. */ > > + vect_slp_reset_pattern (this->m_mul_nodes[0]); > > + vect_slp_reset_pattern (this->m_mul_nodes[1]); > > + vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]); > > + > > + return true; > > +} > > + > > + > > > +/********************************************************* > ********************** > > + * complex_fms_pattern class > > + > ********************************************************** > ********************/ > > + > > +class complex_fms_pattern : public complex_mul_pattern > > +{ > > + protected: > > + complex_fms_pattern (slp_tree *node) > > + : complex_mul_pattern (node) > > + { > > + this->m_num_args = 3; > > + } > > + > > + public: > > + static vect_pattern* create (slp_tree *node) > > + { > > + return new complex_fms_pattern (node); > > + } > > + > > + const char* get_name () > > + { > > + return "Complex FMS"; > > + } > > + > > + bool matches (slp_tree_to_load_perm_map_t *); > > + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *, > > + vec<slp_tree>); > > +}; > > + > > +/* Pattern matcher for trying to match complex multiply and accumulate > > + and multiply and subtract patterns in SLP tree. > > + If the operation matches then IFN is set to the operation it matched and > > + the arguments to the two replacement statements are put in m_ops. > > + > > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. > > + > > + This function matches the patterns shaped as: > > + > > + double ax = (b[i+1] * a[i]) + (b[i] * a[i]); > > + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]); > > + > > + c[i] = c[i] - ax; > > + c[i+1] = c[i+1] + bx; > > + > > + If a match occurred then TRUE is returned, else FALSE. The initial > > match > is > > + expected to be in OP1 and the initial match operands in args0. */ > > +bool > > +complex_fms_pattern::matches (complex_operation_t op1, > > + slp_tree_to_load_perm_map_t *perm_cache, > > + vec<slp_tree> args0) > > +{ > > + this->m_ifn = IFN_LAST; > > + > > + /* Find the two components. We match Complex MUL first which > reduces the > > + amount of work this pattern has to do. After that we just match the > > + head node and we're done.: > > + > > + * FMS: - +. */ > > + slp_tree child = NULL; > > + > > + /* We need to ignore the two_operands nodes that may also match, > > + for that we can check if they have any scalar statements and also > > + check that it's not a permute node as we're looking for a normal > > + PLUS_EXPR operation. */ > > + if (op1 != PLUS_MINUS) > > + return false; > > + > > + child = SLP_TREE_CHILDREN (args0[1])[1]; > > + if (vect_detect_pair_op (child) != MINUS_PLUS) > > + return false; > > + > > + /* First two nodes must be a multiply. */ > > + auto_vec<slp_tree> muls; > > + if (vect_match_call_complex_mla (child, 0) != MULT_MULT > > + || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT) > > + return false; > > + > > + /* Now operand2+4 may lead to another expression. */ > > + auto_vec<slp_tree> left_op, right_op; > > + left_op.safe_splice (SLP_TREE_CHILDREN (muls[1])); > > + right_op.safe_splice (SLP_TREE_CHILDREN (muls[0])); > > + > > + bool is_neg = vect_normalize_conj_loc (right_op); > > + > > + this->m_ops.create (6); > > + child = SLP_TREE_CHILDREN (args0[0])[0]; > > + this->m_ops.quick_push (child); > > + if (!is_neg) > > + { > > + this->m_ifn = IFN_COMPLEX_FMS; > > + this->m_ops.quick_push (left_op[1]); > > + this->m_ops.quick_push (left_op[0]); > > + } > > + else if (is_neg) > > + { > > + bool conj_first_operand; > > + if (!vect_validate_multiplication (perm_cache, left_op, right_op, > > false, > > + &conj_first_operand)) > > + return false; > > + > > + this->m_ifn = IFN_COMPLEX_FMS_CONJ; > > + /* Check if the conjucate is on the first or second parameter. */ > > + if (!conj_first_operand) > > + { > > + this->m_ops.quick_push (left_op[1]); > > + this->m_ops.quick_push (left_op[0]); > > + } > > + else > > + { > > + this->m_ops.quick_push (left_op[0]); > > + this->m_ops.quick_push (left_op[1]); > > + } > > + } > > + > > + this->m_ops.quick_push (child); > > + this->m_ops.quick_push (right_op[1]); > > + this->m_ops.quick_push (right_op[0]); > > + > > + vect_build_perm_groups (perm_cache, &this->m_blocks[0], this- > >m_ops); > > + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node); > > + return true; > > +} > > + > > +bool > > +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + auto_vec<slp_tree> args0; > > + complex_operation_t op > > + = vect_detect_pair_op (*this->m_node, true, &args0); > > + return matches (op, perm_cache, args0); > > +} > > + > > + > > > +/********************************************************* > ********************** > > + * complex_operations_pattern class > > + > ********************************************************** > ********************/ > > + > > +/* This function combines all the existing pattern matchers above into one > class > > + that shares the functionality between them. The initial match is shared > > + between all complex operations. */ > > + > > +class complex_operations_pattern : public complex_pattern > > +{ > > + protected: > > + complex_operations_pattern (slp_tree *node) > > + : complex_pattern (node) > > + { } > > + > > + gcall *build (); > > + > > + /* The current "active" pattern. */ > > + vect_pattern *m_patt = NULL; > > + > > + public: > > + using complex_pattern::matches; > > + > > + bool matches (slp_tree_to_load_perm_map_t *perm_cache); > > + const char* get_name (); > > + bool validate_p (slp_tree_to_load_perm_map_t *perm_cache); > > + bool is_optab_supported_p (tree, optimization_type); > > + internal_fn get_ifn (); > > + gcall *build (vec_info *); > > + void reset (slp_tree *); > > + ~complex_operations_pattern (); > > + > > + static vect_pattern* create (slp_tree *node) > > + { > > + return new complex_operations_pattern (node); > > + } > > +}; > > + > > + > > +/* Clean up the pattern if one existed. */ > > + > > +complex_operations_pattern::~complex_operations_pattern () > > +{ > > + if (this->m_patt) > > + delete this->m_patt; > > + this->m_patt = NULL; > > +} > > + > > +/* Match but do not perform any additional operations on the SLP tree. If > the > > + match is successful then that pattern is made "active" in the class. */ > > + > > +bool > > +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t > *perm_cache) > > +{ > > + complex_operation_t op > > + = vect_detect_pair_op (*this->m_node, true, &this->m_ops); > > + > > + /* Check which pattern this may be. Match longest pattern first. */ > > + this->m_patt = complex_fma_pattern::create (this->m_node); > > + if (this->m_patt->matches (op, perm_cache, this->m_ops)) > > + return true; > > Hmm, all of these allocations like also > > + for (unsigned x = 0; x < num__slp_patterns; x++) > + { > + vect_pattern *pattern = slp_patterns[x] (ref_node); > + found_p = pattern->recognize (perm_cache, vinfo); > + delete pattern; > + found_rec_p = found_p | found_rec_p; > + } > > in the main driver are really due to the C++-ish layout of all this :/ > I don't like that very much. It also highlights when reading > the patch that it's hard to get the difference betwee > recognize (), match () and validate () where all but the first is > kind-of internal implementation details of the complex matchers. > > So we could eventually get by with the main driver doing > > + for (unsigned x = 0; x < num__slp_patterns; x++) > + { > + vect_pattern *pattern = slp_patterns[x] (ref_node); > if (pattern) > found_rec_p = true; > } > > thus merge ->recognize with the static allocator and not do > any heap allocation when nothing is matched? Then ... > > > + if (op == CMPLX_NONE) > > + return false; > > + > > + delete this->m_patt; > > + > > + this->m_patt = complex_fms_pattern::create (this->m_node); > > ... this could use static typing and an automatic instance? > > The myth that allocation & deallocation is cheap is really that, > a myth. We're doing 6 allocations & deallocations at this > very toplevel for each SLP node (which means eventually for > each statement in a function). > > > So why for example would one need to override recognize ()? > I'd argue for nailing it in place and to remove validate_p () > (should be done by matches?). At least validate_p for > the complex patterns seem to build SLP nodes which mean > they actually build()? > > > I think overall the patch looks sensible but it's still quite > hard to follow the flow of an actual pattern match, possibly > because of the strange factoring. > > At this point to speed up things can you split the patch > so there's only the very simplest (complex_add for example) > matcher plus requirements? Then we can push that which > would allow me to actually play with it more easily and > even try out changes? > > I'd say split out the optimize_load_redistribution part of > the first patch and merge the complex_add support into > its remains so we have a first patch that does something? > > Thanks and sorry for being dense here ... :/ There's pieces > I think we should do better but I still can't lay out a > solution and I guess pointing fingers at the individual pieces > like above will lead to very slow progress only. > > Richard. > > > + if (this->m_patt->matches (op, perm_cache, this->m_ops)) > > + return true; > > + > > + delete this->m_patt; > > + > > + this->m_patt = complex_mul_pattern::create (this->m_node); > > + if (this->m_patt->matches (op, perm_cache, this->m_ops)) > > + return true; > > + > > + delete this->m_patt; > > + > > + this->m_patt = complex_add_pattern::create (this->m_node); > > + if (this->m_patt->matches (op, perm_cache, this->m_ops)) > > + return true; > > + > > + delete this->m_patt; > > + > > + this->m_patt = NULL; > > + return false; > > +} > > + > > +/* Friendly name of the operation the pattern matches. */ > > + > > +const char* > > +complex_operations_pattern::get_name () > > +{ > > + if (!this->m_patt) > > + return "Complex Operations"; > > + > > + return this->m_patt->get_name (); > > +} > > + > > +/* Only perform the pattern creation part of the matcher. This creates > and > > + returns the new pattern statement and encapsulates the currently > active > > + pattern. */ > > + > > +gcall * > > +complex_operations_pattern::build (vec_info *vinfo) > > +{ > > + if (!this->m_patt) > > + return NULL; > > + > > + return this->m_patt->build (vinfo); > > +} > > + > > +/* Validate the operation the currently active pattern detected. */ > > + > > +bool > > +complex_operations_pattern::validate_p > (slp_tree_to_load_perm_map_t *perm_cache) > > +{ > > + if (!this->m_patt) > > + return false; > > + > > + return this->m_patt->validate_p (perm_cache); > > +} > > + > > +/* Check if the obtab of the currently active matches is supported. */ > > + > > +bool > > +complex_operations_pattern::is_optab_supported_p (tree vectype, > > + optimization_type opt_type) > > +{ > > + if (!vectype || !this->m_patt) > > + return false; > > + > > + return this->m_patt->is_optab_supported_p (vectype, opt_type); > > +} > > + > > +/* Return the matched IFN from the current active matcher if any. */ > > + > > +internal_fn > > +complex_operations_pattern::get_ifn () > > +{ > > + if (!this->m_patt) > > + return IFN_LAST; > > + > > + return this->m_patt->get_ifn (); > > +} > > + > > > +/********************************************************* > ********************** > > + * Pattern matching definitions > > + > ********************************************************** > ********************/ > > + > > +#define SLP_PATTERN(x) &x::create > > +vect_pattern_decl_t slp_patterns[] > > +{ > > + /* For least amount of back-tracking and more efficient matching > > + order patterns from the largest to the smallest. Especially if they > > + overlap in what they can detect. */ > > + > > + /* FMA overlaps with MUL but is the longer sequence. Because we're in > post > > + order traversal we can't match FMA if included in > > + complex_operations_pattern so must be checked on it's own. */ > > + SLP_PATTERN (complex_operations_pattern), > > +}; > > +#undef SLP_PATTERN > > + > > +/* Set the number of SLP pattern matchers available. */ > > +size_t num__slp_patterns = > sizeof(slp_patterns)/sizeof(vect_pattern_decl_t); > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, > Germany; GF: Felix Imend