On Thu, 24 Feb 2022, Tamar Christina wrote:

> Hi All,
> 
> This is a backport of the GCC 12 patch backporting only the correctness part 
> of
> the fix.   This also backports two small helper functions and documentation
> update on the optabs.
> 
> The patch boosts the analysis for complex mul,fma and fms in order to ensure
> that it doesn't create an incorrect output.
> 
> Essentially it adds an extra verification to check that the two nodes it's 
> going
> to combine do the same operations on compatible values.  The reason it needs 
> to
> do this is that if one computation differs from the other then with the 
> current
> implementation we have no way to deal with it since we have to remove the
> permute.
> 
> When we can keep the permute around we can probably handle these by unrolling.
> 
> While implementing this since I have to do the traversal anyway I took 
> advantage
> of it by simplifying the code a bit.  Previously we would determine whether
> something is a conjugate and then try to figure out which conjugate it is and
> then try to see if the permutes match what we expect.
> 
> Now the code that does the traversal will detect this in one go and return to 
> us
> whether the operation is something that can be combined and whether a 
> conjugate
> is present.
> 
> Secondly because it does this I can now simplify the checking code itself to
> essentially just try to apply fixed patterns to each operation.
> 
> The patterns represent the order operations should appear in. For instance a
> complex MUL operation combines :
> 
>   Left 1 + Right 1
>   Left 2 + Right 2
> 
> with a permute on the nodes consisting of:
> 
>   { Even, Even } + { Odd, Odd  }
>   { Even, Odd  } + { Odd, Even }
> 
> By abstracting over these patterns the checking code becomes quite simple.
> 
> As part of this I was checking the order of the operands which was left in
> "slp" order. as in, the same order they showed up in during SLP, which means
> that the accumulator is first.  However it looks like I didn't document this.
> 
> I have this changed the order to match that of FMA and FMS which corrects the
> x86 codegen and will update the Arm targets.  This has now also been
> documented.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for GCC-11?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       PR tree-optimization/102819
>       PR tree-optimization/103169
>       * doc/md.texi: Update docs for cfms, cfma.
>       * tree-data-ref.h (same_data_refs): Accept optional offset.
>       * tree-vect-slp-patterns.cc (is_linear_load_p): Fix issue with repeating
>       patterns.
>       (vect_normalize_conj_loc): Remove.
>       (is_eq_or_top): Change to take two nodes.
>       (enum _conj_status, compatible_complex_nodes_p,
>       vect_validate_multiplication): New.
>       (class complex_add_pattern, complex_add_pattern::matches,
>       complex_add_pattern::recognize, class complex_mul_pattern,
>       complex_mul_pattern::recognize, class complex_fms_pattern,
>       complex_fms_pattern::recognize,, class complex_fma_pattern,
>       complex_fma_pattern::recognize, class complex_operations_pattern,
>       complex_operations_pattern::recognize, addsub_pattern::recognize): Pass
>       new cache.
>       (complex_fms_pattern::matches, complex_fma_pattern::matches,
>       complex_mul_pattern::matches): Pass new cache and use new validation
>       code.
>       * tree-vect-slp.cc (vect_match_slp_patterns_2, vect_match_slp_patterns,
>       vect_analyze_slp): Pass along cache.
>       (compatible_calls_p): Expose.
>       * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
>       slp_compat_nodes_map_t): New.
>       (class vect_pattern): Update signatures include new cache.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR tree-optimization/102819
>       PR tree-optimization/103169
>       * g++.dg/vect/pr99149.cc: xfail for now.
>       * gcc.dg/vect/complex/pr102819-1.c: New test.
>       * gcc.dg/vect/complex/pr102819-2.c: New test.
>       * gcc.dg/vect/complex/pr102819-3.c: New test.
>       * gcc.dg/vect/complex/pr102819-4.c: New test.
>       * gcc.dg/vect/complex/pr102819-5.c: New test.
>       * gcc.dg/vect/complex/pr102819-6.c: New test.
>       * gcc.dg/vect/complex/pr102819-7.c: New test.
>       * gcc.dg/vect/complex/pr102819-8.c: New test.
>       * gcc.dg/vect/complex/pr102819-9.c: New test.
>       * gcc.dg/vect/complex/pr103169.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..ac7611008944abca08fe48cd7a74b8463f1573da
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6234,12 +6234,13 @@ Perform a vector multiply and accumulate that is 
> semantically the same as
>  a multiply and accumulate of complex numbers.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * b[i];
> +      op0[i] = op1[i] * op2[i] + op3[i];
>      @}
>  @end smallexample
>  
> @@ -6257,12 +6258,13 @@ the same as a multiply and accumulate of complex 
> numbers where the second
>  multiply arguments is conjugated.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * conj (b[i]);
> +      op0[i] = op1[i] * conj (op2[i]) + op3[i];
>      @}
>  @end smallexample
>  
> @@ -6279,12 +6281,13 @@ Perform a vector multiply and subtract that is 
> semantically the same as
>  a multiply and subtract of complex numbers.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] -= a[i] * b[i];
> +      op0[i] = op1[i] * op2[i] - op3[i];
>      @}
>  @end smallexample
>  
> @@ -6302,12 +6305,13 @@ the same as a multiply and subtract of complex 
> numbers where the second
>  multiply arguments is conjugated.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] -= a[i] * conj (b[i]);
> +      op0[i] = op1[i] * conj (op2[i]) - op3[i];
>      @}
>  @end smallexample
>  
> @@ -6324,12 +6328,12 @@ Perform a vector multiply that is semantically the 
> same as multiply of
>  complex numbers.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * b[i];
> +      op0[i] = op1[i] * op2[i];
>      @}
>  @end smallexample
>  
> @@ -6346,12 +6350,12 @@ Perform a vector multiply by conjugate that is 
> semantically the same as a
>  multiply of complex numbers where the second multiply arguments is 
> conjugated.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * conj (b[i]);
> +      op0[i] = op1[i] * conj (op2[i]);
>      @}
>  @end smallexample
>  
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 
> 3ec86f5f08283e55d4eefdf0f4d709d1b6c16abf..c69658ec929fe6711806724a299dbac83ab4042e
>  100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -4640,6 +4640,31 @@ gimple_phi_arg_has_location (const gphi *phi, size_t i)
>    return gimple_phi_arg_location (phi, i) != UNKNOWN_LOCATION;
>  }
>  
> +/* Return the number of arguments that can be accessed by gimple_arg.  */
> +
> +static inline unsigned
> +gimple_num_args (const gimple *gs)
> +{
> +  if (auto phi = dyn_cast<const gphi *> (gs))
> +    return gimple_phi_num_args (phi);
> +  if (auto call = dyn_cast<const gcall *> (gs))
> +    return gimple_call_num_args (call);
> +  return gimple_num_ops (as_a <const gassign *> (gs)) - 1;
> +}
> +
> +/* GS must be an assignment, a call, or a PHI.
> +   If it's an assignment, return rhs operand I.
> +   If it's a call, return function argument I.
> +   If it's a PHI, return the value of PHI argument I.  */
> +
> +static inline tree
> +gimple_arg (const gimple *gs, unsigned int i)
> +{
> +  if (auto phi = dyn_cast<const gphi *> (gs))
> +    return gimple_phi_arg_def (phi, i);
> +  if (auto call = dyn_cast<const gcall *> (gs))
> +    return gimple_call_arg (call, i);
> +  return gimple_op (as_a <const gassign *> (gs), i + 1);                     
>                                                                               
>                                                                               
>                                       }
>  
>  /* Return the region number for GIMPLE_RESX RESX_STMT.  */
>  
> diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc 
> b/gcc/testsuite/g++.dg/vect/pr99149.cc
> index 
> 00ebe9d9cdf600ada8e66b4b854f0e18ad0b6a7d..fd33700d91c1b734557bfe1db8f7a7774e95deda
>  100755
> --- a/gcc/testsuite/g++.dg/vect/pr99149.cc
> +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
> @@ -24,5 +24,4 @@ public:
>  } n;
>  main() { n.j(); }
>  
> -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
> -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_FMA" 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { 
> vect_float } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 4)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + 
> v1);
> +      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + 
> v2);
> +      //                  ^^^^^^^             ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good1()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good2()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
> +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
> +      //                  ^^^^^^^             ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad2()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
> +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
> +      //                          ^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad3()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> +      //                            ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { 
> vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +#include <stdio.h>
> +#include <complex.h>
> +
> +#define N 200
> +#define TYPE float
> +#define TYPE2 float
> +
> +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE 
> complex c[restrict N])
> +{
> +  for (int i=0; i < N; i++)
> +    {
> +      c[i] -=  a[i] * b[0];
> +    }
> +}
> +
> +/* The pattern overlaps with COMPLEX_ADD so we need to support consuming 
> ADDs in COMPLEX_FMS.  */
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { 
> vect_float } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c 
> b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { vect_double } } } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
> +
> +_Complex double b_0, c_0;
> +
> +void
> +mul270snd (void)
> +{
> +  c_0 = b_0 * 1.0iF * 1.0iF;
> +}
> +
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index 
> 8001cc54f518d9d9d1a0fcfe5790d22dae109fb2..66a9e0d44351e40075c828fbc7d04a2f9a621862
>  100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -594,10 +594,11 @@ same_data_refs_base_objects (data_reference_p a, 
> data_reference_p b)
>  }
>  
>  /* Return true when the data references A and B are accessing the same
> -   memory object with the same access functions.  */
> +   memory object with the same access functions.  Optionally skip the
> +   last OFFSET dimensions in the data reference.  */
>  
>  static inline bool
> -same_data_refs (data_reference_p a, data_reference_p b)
> +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
>  {
>    unsigned int i;
>  
> @@ -608,7 +609,7 @@ same_data_refs (data_reference_p a, data_reference_p b)
>    if (!same_data_refs_base_objects (a, b))
>      return false;
>  
> -  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> +  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
>      if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
>        return false;
>  
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> 2ed49cd9edcabd7948b365dd60d7405b79079a7b..a3bd90ff85b4ca5423a94388d480b66051a83e08
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
>    int valid_patterns = 4;
>    FOR_EACH_VEC_ELT (loads, i, load)
>      {
> -      if (candidates[0] != PERM_UNKNOWN && load != 1)
> +      unsigned adj_load = load % 2;
> +      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
>       {
>         candidates[0] = PERM_UNKNOWN;
>         valid_patterns--;
>       }
> -      if (candidates[1] != PERM_UNKNOWN && load != 0)
> +      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
>       {
>         candidates[1] = PERM_UNKNOWN;
>         valid_patterns--;
> @@ -604,11 +605,12 @@ class complex_add_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> -          vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +          slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>  
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +            slp_tree *);
>  
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -655,6 +657,7 @@ complex_add_pattern::build (vec_info *vinfo)
>  internal_fn
>  complex_add_pattern::matches (complex_operation_t op,
>                             slp_tree_to_load_perm_map_t *perm_cache,
> +                           slp_compat_nodes_map_t * /* compat_cache */,
>                             slp_tree *node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -700,13 +703,14 @@ complex_add_pattern::matches (complex_operation_t op,
>  
>  vect_pattern*
>  complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t *compat_cache,
>                               slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_add_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_add_pattern::matches (op, perm_cache, compat_cache, node, 
> &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>  
> @@ -738,139 +742,214 @@ vect_match_call_complex_mla (slp_tree node, unsigned 
> child,
>    return vect_detect_pair_op (data, false, args);
>  }
>  
> -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the 
> first
> -   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> -
> -   If a negate is found then the values in ARGS are reordered such that the
> -   negate node is always the second one and the entry is replaced by the 
> child
> -   of the negate node.  */
> +/* Helper function to check if PERM is KIND or PERM_TOP.  */
>  
>  static inline bool
> -vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
> +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
> +           slp_tree op1, complex_perm_kinds_t kind1,
> +           slp_tree op2, complex_perm_kinds_t kind2)
>  {
> -  gcc_assert (args.length () == 2);
> -  bool neg_found = false;
> -
> -  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> -    {
> -      std::swap (args[0], args[1]);
> -      neg_found = true;
> -      if (neg_first_p)
> -     *neg_first_p = true;
> -    }
> -  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> -    {
> -      neg_found = true;
> -      if (neg_first_p)
> -     *neg_first_p = false;
> -    }
> +  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
> +  if (perm1 != kind1 && perm1 != PERM_TOP)
> +    return false;
>  
> -  if (neg_found)
> -    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> +  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
> +  if (perm2 != kind2 && perm2 != PERM_TOP)
> +    return false;
>  
> -  return neg_found;
> +  return true;
>  }
>  
> -/* Helper function to check if PERM is KIND or PERM_TOP.  */
> +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
>  
>  static inline bool
> -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
> +compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache,
> +                         slp_tree a, int *pa, slp_tree b, int *pb)
>  {
> -  return perm == kind || perm == PERM_TOP;
> -}
> +  bool *tmp;
> +  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
> +  if ((tmp = compat_cache->get (key)) != NULL)
> +    return *tmp;
>  
> -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both 
> MULT_EXPR
> -   nodes but also that they represent an operation that is either a complex
> -   multiplication or a complex multiplication by conjugated value.
> +   compat_cache->put (key, false);
>  
> -   Of the negation is expected to be in the first half of the tree (As 
> required
> -   by an FMS pattern) then NEG_FIRST is true.  If the operation is a 
> conjugate
> -   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
> -   second operand contains the conjugate operation.  */
> +  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
> +    return false;
>  
> -static inline bool
> -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> -                          vec<slp_tree> left_op, vec<slp_tree> right_op,
> -                          bool neg_first, bool *conj_first_operand,
> -                          bool fms)
> -{
> -  /* The presence of a negation indicates that we have either a conjugate or 
> a
> -     rotation.  We need to distinguish which one.  */
> -  *conj_first_operand = false;
> -  complex_perm_kinds_t kind;
> -
> -  /* Complex conjugates have the negation on the imaginary part of the
> -     number where rotations affect the real component.  So check if the
> -     negation is on a dup of lane 1.  */
> -  if (fms)
> +  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
> +    return false;
> +
> +  /* Only internal nodes can be loads, as such we can't check further if they
> +     are externals.  */
> +  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
>      {
> -      /* Canonicalization for fms is not consistent. So have to test both
> -      variants to be sure.  This needs to be fixed in the mid-end so
> -      this part can be simpler.  */
> -      kind = linear_loads_p (perm_cache, right_op[0]);
> -      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), 
> PERM_ODDODD)
> -        && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> -                          PERM_ODDEVEN))
> -       || (kind == PERM_ODDEVEN
> -           && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> -                          PERM_ODDODD))))
> -     return false;
> +      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
> +     {
> +       tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
> +       tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
> +       if (!operand_equal_p (op1, op2, 0))
> +         return false;
> +     }
> +
> +      compat_cache->put (key, true);
> +      return true;
>      }
> +
> +  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
> +  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
> +
> +  if (gimple_code (a_stmt) != gimple_code (b_stmt))
> +    return false;
> +
> +  /* code, children, type, externals, loads, constants  */
> +  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
> +    return false;
> +
> +  /* At this point, a and b are known to be the same gimple operations.  */
> +  if (is_gimple_call (a_stmt))
> +    {
> +     if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
> +                              dyn_cast <gcall *> (b_stmt)))
> +       return false;
> +    }
> +  else if (!is_gimple_assign (a_stmt))
> +    return false;
>    else
>      {
> -      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
> -       && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> -                         PERM_ODDEVEN))
> +      tree_code acode = gimple_assign_rhs_code (a_stmt);
> +      tree_code bcode = gimple_assign_rhs_code (b_stmt);
> +      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
> +       && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
> +     return true;
> +
> +      if (acode != bcode)
>       return false;
>      }
>  
> -  /* Deal with differences in indexes.  */
> -  int index1 = fms ? 1 : 0;
> -  int index2 = fms ? 0 : 1;
> -
> -  /* Check if the conjugate is on the second first or second operand.  The
> -     order of the node with the conjugate value determines this, and the dup
> -     node must be one of lane 0 of the same DR as the neg node.  */
> -  kind = linear_loads_p (perm_cache, left_op[index1]);
> -  if (kind == PERM_TOP)
> +  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
> +      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
>      {
> -      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
> -     return true;
> +      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
> +     {
> +       tree t1 = gimple_arg (a_stmt, i);
> +       tree t2 = gimple_arg (b_stmt, i);
> +       if (TREE_CODE (t1) != TREE_CODE (t2))
> +         return false;
> +
> +       /* If SSA name then we will need to inspect the children
> +          so we can punt here.  */
> +       if (TREE_CODE (t1) == SSA_NAME)
> +         continue;
> +
> +       if (!operand_equal_p (t1, t2, 0))
> +         return false;
> +     }
>      }
> -  else if (kind == PERM_EVENODD)
> +  else
>      {
> -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) == 
> PERM_EVENODD)
> +      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
> +      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
> +      /* Don't check the last dimension as that's checked by the lineary
> +      checks.  This check is also much stricter than what we need
> +      because it doesn't consider loading from adjacent elements
> +      in the same struct as loading from the same base object.
> +      But for now, I'll play it safe.  */
> +      if (!same_data_refs (dr1, dr2, 1))
>       return false;
> -      return true;
>      }
> -  else if (!neg_first)
> -    *conj_first_operand = true;
> -  else
> -    return false;
>  
> -  if (kind != PERM_EVENEVEN)
> -    return false;
> +  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
> +    {
> +      if (!compatible_complex_nodes_p (compat_cache,
> +                                    SLP_TREE_CHILDREN (a)[i], pa,
> +                                    SLP_TREE_CHILDREN (b)[i], pb))
> +     return false;
> +    }
>  
> +  compat_cache->put (key, true);
>    return true;
>  }
>  
> -/* Helper function to help distinguish between a conjugate and a rotation in 
> a
> -   complex multiplication.  The operations have similar shapes but the order 
> of
> -   the load permutes are different.  This function returns TRUE when the 
> order
> -   is consistent with a multiplication or multiplication by conjugated
> -   operand but returns FALSE if it's a multiplication by rotated operand.  */
> -
>  static inline bool
>  vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> -                          vec<slp_tree> op, complex_perm_kinds_t permKind)
> +                           slp_compat_nodes_map_t *compat_cache,
> +                           vec<slp_tree> &left_op,
> +                           vec<slp_tree> &right_op,
> +                           bool subtract,
> +                           enum _conj_status *_status)
>  {
> -  /* The left node is the more common case, test it first.  */
> -  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
> +  auto_vec<slp_tree> ops;
> +  enum _conj_status stats = CONJ_NONE;
> +
> +  /* The complex operations can occur in two layouts and two permute 
> sequences
> +     so declare them and re-use them.  */
> +  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
> +                 , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
> +                 };
> +
> +  /* Now for the corresponding permutes that go with these values.  */
> +  complex_perm_kinds_t perms[][4]
> +    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN }
> +      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD }
> +      };
> +
> +  /* These permutes are used during comparisons of externals on which
> +     we require strict equality.  */
> +  int cq[][4][2]
> +    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
> +      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
> +      };
> +
> +  /* Default to style and perm 0, most operations use this one.  */
> +  int style = 0;
> +  int perm = subtract ? 1 : 0;
> +
> +  /* Check if we have a negate operation, if so absorb the node and continue
> +     looking.  */
> +  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
> +  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
> +
> +  /* Determine which style we're looking at.  We only have different ones
> +     whenever a conjugate is involved.  */
> +  if (neg0 && neg1)
> +    ;
> +  else if (neg0)
>      {
> -      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
> -     return false;
> +      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
> +      stats = CONJ_FST;
> +      if (subtract)
> +     perm = 0;
>      }
> -  return true;
> +  else if (neg1)
> +    {
> +      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
> +      stats = CONJ_SND;
> +      perm = 1;
> +    }
> +
> +  *_status = stats;
> +
> +  /* Flatten the inputs after we've remapped them.  */
> +  ops.create (4);
> +  ops.safe_splice (left_op);
> +  ops.safe_splice (right_op);
> +
> +  /* Extract out the elements to check.  */
> +  slp_tree op0 = ops[styles[style][0]];
> +  slp_tree op1 = ops[styles[style][1]];
> +  slp_tree op2 = ops[styles[style][2]];
> +  slp_tree op3 = ops[styles[style][3]];
> +
> +  /* Do cheapest test first.  If failed no need to analyze further.  */
> +  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
> +      || linear_loads_p (perm_cache, op1) != perms[perm][1]
> +      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, 
> perms[perm][3]))
> +    return false;
> +
> +  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1,
> +                                  cq[perm][1])
> +      && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3,
> +                                     cq[perm][3]);
>  }
>  
>  /* This function combines two nodes containing only even and only odd lanes
> @@ -929,11 +1008,12 @@ class complex_mul_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> -          vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +          slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>  
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +            slp_tree *);
>  
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -963,6 +1043,7 @@ class complex_mul_pattern : public complex_pattern
>  internal_fn
>  complex_mul_pattern::matches (complex_operation_t op,
>                             slp_tree_to_load_perm_map_t *perm_cache,
> +                           slp_compat_nodes_map_t *compat_cache,
>                             slp_tree *node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -985,28 +1066,15 @@ complex_mul_pattern::matches (complex_operation_t op,
>    if (linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
>      return IFN_LAST;
>  
> -  bool neg_first = false;
> -  bool conj_first_operand = false;
> -  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> -
> -  if (!is_neg)
> -    {
> -      /* A multiplication needs to multiply agains the real pair, otherwise
> -      the pattern matches that of FMS.   */
> -      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
> -       || vect_normalize_conj_loc (left_op))
> -     return IFN_LAST;
> -      ifn = IFN_COMPLEX_MUL;
> -    }
> -  else if (is_neg)
> -    {
> -      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> -                                      neg_first, &conj_first_operand,
> -                                      false))
> -     return IFN_LAST;
> +  enum _conj_status status;
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
> +                                  right_op, false, &status))
> +    return IFN_LAST;
>  
> -      ifn = IFN_COMPLEX_MUL_CONJ;
> -    }
> +  if (status == CONJ_NONE)
> +    ifn = IFN_COMPLEX_MUL;
> +  else
> +    ifn = IFN_COMPLEX_MUL_CONJ;
>  
>    if (!vect_pattern_validate_optab (ifn, *node))
>      return IFN_LAST;
> @@ -1015,19 +1083,13 @@ complex_mul_pattern::matches (complex_operation_t op,
>    ops->create (3);
>  
>    complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
> -  if (kind == PERM_EVENODD)
> +  if (kind == PERM_EVENODD || kind == PERM_TOP)
>      {
>        ops->quick_push (left_op[1]);
>        ops->quick_push (right_op[1]);
>        ops->quick_push (left_op[0]);
>      }
> -  else if (kind == PERM_TOP)
> -    {
> -      ops->quick_push (left_op[1]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (left_op[0]);
> -    }
> -  else if (kind == PERM_EVENEVEN && !conj_first_operand)
> +  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
>      {
>        ops->quick_push (left_op[0]);
>        ops->quick_push (right_op[0]);
> @@ -1047,13 +1109,14 @@ complex_mul_pattern::matches (complex_operation_t op,
>  
>  vect_pattern*
>  complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t *compat_cache,
>                               slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, 
> &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>  
> @@ -1100,11 +1163,12 @@ class complex_fma_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> -          vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +          slp_compat_nodes_map_t *, slp_tree *,  vec<slp_tree> *);
>  
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +            slp_tree *);
>  
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -1136,6 +1200,7 @@ class complex_fma_pattern : public complex_pattern
>  internal_fn
>  complex_fma_pattern::matches (complex_operation_t op,
>                             slp_tree_to_load_perm_map_t * /* perm_cache */,
> +                           slp_compat_nodes_map_t * /* compat_cache */,
>                             slp_tree *ref_node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -1199,13 +1264,14 @@ complex_fma_pattern::matches (complex_operation_t op,
>  
>  vect_pattern*
>  complex_fma_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t *compat_cache,
>                               slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_fma_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_fma_pattern::matches (op, perm_cache, compat_cache, node, 
> &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>  
> @@ -1248,11 +1314,12 @@ class complex_fms_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> -          vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +          slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>  
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +            slp_tree *);
>  
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -1283,6 +1350,7 @@ class complex_fms_pattern : public complex_pattern
>  internal_fn
>  complex_fms_pattern::matches (complex_operation_t op,
>                             slp_tree_to_load_perm_map_t *perm_cache,
> +                           slp_compat_nodes_map_t *compat_cache,
>                             slp_tree * ref_node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -1316,17 +1384,14 @@ complex_fms_pattern::matches (complex_operation_t op,
>    left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
>    right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
>  
> -  bool is_neg = vect_normalize_conj_loc (left_op);
> -
> -  child = SLP_TREE_CHILDREN ((*ops)[1])[0];
> -  bool conj_first_operand = false;
> -  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
> -                                  &conj_first_operand, true))
> +  enum _conj_status status;
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
> +                                  left_op, true, &status))
>      return IFN_LAST;
>  
> -  if (!is_neg)
> +  if (status == CONJ_NONE)
>      ifn = IFN_COMPLEX_FMS;
> -  else if (is_neg)
> +  else
>      ifn = IFN_COMPLEX_FMS_CONJ;
>  
>    if (!vect_pattern_validate_optab (ifn, *ref_node))
> @@ -1343,26 +1408,12 @@ complex_fms_pattern::matches (complex_operation_t op,
>        ops->quick_push (right_op[1]);
>        ops->quick_push (left_op[1]);
>      }
> -  else if (kind == PERM_TOP)
> -    {
> -      ops->quick_push (child);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[0]);
> -    }
> -  else if (kind == PERM_EVENEVEN && !is_neg)
> -    {
> -      ops->quick_push (child);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[0]);
> -    }
>    else
>      {
>        ops->quick_push (child);
>        ops->quick_push (right_op[1]);
>        ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[1]);
> +      ops->quick_push (left_op[0]);
>      }
>  
>    return ifn;
> @@ -1372,13 +1423,14 @@ complex_fms_pattern::matches (complex_operation_t op,
>  
>  vect_pattern*
>  complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t *compat_cache,
>                               slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, 
> &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>  
> @@ -1405,9 +1457,9 @@ complex_fms_pattern::build (vec_info *vinfo)
>    SLP_TREE_CHILDREN (*this->m_node).create (3);
>  
>    /* First re-arrange the children.  */
> -  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
>    SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
>    SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> +  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
>  
>    /* And then rewrite the node itself.  */
>    complex_pattern::build (vinfo);
> @@ -1434,11 +1486,12 @@ class complex_operations_pattern : public 
> complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree 
> *,
> -          vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +          slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>  
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +            slp_tree *);
>  };
>  
>  /* Dummy matches implementation for proxy object.  */
> @@ -1447,6 +1500,7 @@ internal_fn
>  complex_operations_pattern::
>  matches (complex_operation_t /* op */,
>        slp_tree_to_load_perm_map_t * /* perm_cache */,
> +      slp_compat_nodes_map_t * /* compat_cache */,
>        slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
>  {
>    return IFN_LAST;
> @@ -1456,6 +1510,7 @@ matches (complex_operation_t /* op */,
>  
>  vect_pattern*
>  complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t 
> *perm_cache,
> +                                    slp_compat_nodes_map_t *ccache,
>                                      slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
> @@ -1463,19 +1518,19 @@ complex_operations_pattern::recognize 
> (slp_tree_to_load_perm_map_t *perm_cache,
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn = IFN_LAST;
>  
> -  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_fms_pattern::mkInstance (node, &ops, ifn);
>  
> -  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_mul_pattern::mkInstance (node, &ops, ifn);
>  
> -  ifn  = complex_fma_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_fma_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_fma_pattern::mkInstance (node, &ops, ifn);
>  
> -  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_add_pattern::mkInstance (node, &ops, ifn);
>  
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 
> 230ff4081a597d9cf813ae5d81e31767de599971..abd61b74832c26766538bb986e539b13f8f885c5
>  100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -827,7 +827,7 @@ vect_update_shared_vectype (stmt_vec_info stmt_info, tree 
> vectype)
>  /* Return true if call statements CALL1 and CALL2 are similar enough
>     to be combined into the same SLP group.  */
>  
> -static bool
> +bool
>  compatible_calls_p (gcall *call1, gcall *call2)
>  {
>    unsigned int nargs = gimple_call_num_args (call1);
> @@ -2414,6 +2414,7 @@ optimize_load_redistribution 
> (scalar_stmts_to_slp_tree_map_t *bst_map,
>  static bool
>  vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
>                          slp_tree_to_load_perm_map_t *perm_cache,
> +                        slp_compat_nodes_map_t *compat_cache,
>                          hash_set<slp_tree> *visited)
>  {
>    unsigned i;
> @@ -2425,11 +2426,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, 
> vec_info *vinfo,
>    slp_tree child;
>    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
>      found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
> -                                       vinfo, perm_cache, visited);
> +                                       vinfo, perm_cache, compat_cache,
> +                                       visited);
>  
>    for (unsigned x = 0; x < num__slp_patterns; x++)
>      {
> -      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> +      vect_pattern *pattern
> +     = slp_patterns[x] (perm_cache, compat_cache, ref_node);
>        if (pattern)
>       {
>         pattern->build (vinfo);
> @@ -2450,7 +2453,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info 
> *vinfo,
>  static bool
>  vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
>                        hash_set<slp_tree> *visited,
> -                      slp_tree_to_load_perm_map_t *perm_cache)
> +                      slp_tree_to_load_perm_map_t *perm_cache,
> +                      slp_compat_nodes_map_t *compat_cache)
>  {
>    DUMP_VECT_SCOPE ("vect_match_slp_patterns");
>    slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> @@ -2460,7 +2464,8 @@ vect_match_slp_patterns (slp_instance instance, 
> vec_info *vinfo,
>                    "Analyzing SLP tree %p for patterns\n",
>                    SLP_INSTANCE_TREE (instance));
>  
> -  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
> +  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, 
> compat_cache,
> +                                 visited);
>  }
>  
>  /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
> @@ -2928,12 +2933,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
> max_tree_size)
>  
>    hash_set<slp_tree> visited_patterns;
>    slp_tree_to_load_perm_map_t perm_cache;
> +  slp_compat_nodes_map_t compat_cache;
>  
>    /* See if any patterns can be found in the SLP tree.  */
>    bool pattern_found = false;
>    FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
>      pattern_found |= vect_match_slp_patterns (instance, vinfo,
> -                                           &visited_patterns, &perm_cache);
> +                                           &visited_patterns, &perm_cache,
> +                                           &compat_cache);
>  
>    /* If any were found optimize permutations of loads.  */
>    if (pattern_found)
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> 85f4762cd083af25d551040e316f7024637189da..f416a74d01045305c3eb7741a05e20a38db69a92
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2029,6 +2029,7 @@ extern int vect_get_place_in_interleaving_chain 
> (stmt_vec_info, stmt_vec_info);
>  extern bool vect_update_shared_vectype (stmt_vec_info, tree);
>  extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
>  extern void vect_free_slp_tree (slp_tree);
> +extern bool compatible_calls_p (gcall *, gcall *);
>  
>  /* In tree-vect-patterns.c.  */
>  extern void
> @@ -2067,6 +2068,12 @@ typedef enum _complex_perm_kinds {
>  typedef hash_map <slp_tree, complex_perm_kinds_t>
>    slp_tree_to_load_perm_map_t;
>  
> +/* Cache from nodes pair to being compatible or not.  */
> +typedef pair_hash <nofree_ptr_hash <_slp_tree>,
> +                nofree_ptr_hash <_slp_tree>> slp_node_hash;
> +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
> +
> +
>  /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
>     from this type.  */
>  
> @@ -2098,7 +2105,8 @@ class vect_pattern
>    public:
>  
>      /* Create a new instance of the pattern matcher class of the given type. 
>  */
> -    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree 
> *);
> +    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> +                                 slp_compat_nodes_map_t *, slp_tree *);
>  
>      /* Build the pattern from the data collected so far.  */
>      virtual void build (vec_info *) = 0;
> @@ -2112,6 +2120,7 @@ class vect_pattern
>  
>  /* Function pointer to create a new pattern matcher from a generic type.  */
>  typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
> +                                           slp_compat_nodes_map_t *,
>                                             slp_tree *);
>  
>  /* List of supported pattern matchers.  */
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Reply via email to