Richard Biener <rguent...@suse.de> writes:
> On December 12, 2019 12:56:01 AM GMT+01:00, Jakub Jelinek <ja...@redhat.com> 
> wrote:
>>Hi!
>>
>>The AVX512{F,VL} vector comparisons that set %kN registers are
>>represented
>>in RTL as comparisons with vector mode operands and scalar integral
>>result,
>>where at runtime the scalar integer is filled with a bitmask.
>>Unfortunately, simplify_relational_operation would fold e.g.
>>(eq:SI (reg:V32HI x) (reg:V32HI x))
>>into (const_int 1) rather than (const_int -1) that is expected (all
>>elements
>>equal).  simplify_const_relational_operation is documented to always
>>return
>>just const0_rtx or const_true_rtx and simplify_relational_operation is
>>expected to fix this up, for vector comparisons with vector result it
>>duplicates the 0 or -1 into all elements, etc., so this patch adds
>>handling
>>for this case there too.
>>
>>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> So there's no whole vector equality RTX but we have to pun to integer modes 
> for that? The eq:SImode would suggest that. Guess we should have used a 
> BImode vector representation... 
>
> Can you check whether we have any target with whole vector compare patterns 
> that would break here? 

We also have vector cbranch<mode>4 for vectors, which does a genuine
scalar comparison of vectors.  It happens that the representation of
them on x86 (using CCmodes only) means that there's currently no
ambiguity with the elementwise vector-to-scalar comparisons above.
But I think further encoding this assumption is going to cause problems
eventually.

E.g. the scalar comparison { 0, 0 } == { 0, 1 } should be false when
used with cbranch<mode>4, but should be 2 when used with an elementwise
interpretation.  If the cbranch<mode>4 interpretation is turned into
a cstore for (cbranch<mode>4 ? 1 : 0), we could easily end up wanting
(eq:SI ...) of those vectors to return false rather than 2.  Even for
{ 0, 0 } == { 0, 0 }, a cstore (eq:SI ...) would need to be 1 rather
than 3.

I wonder how easy it would be to make the mask registers use
MODE_VECTOR_BOOL instead of scalar integers... :-)

For now I think the safest thing would be to punt, rather than assume
one thing or the other.  simplify_const_relational_operation doesn't
handle many vector cases anyway, and most interesting optimisations
like this should happen on gimple, where we know whether the condition
result is a vector or a scalar.

Thanks,
Richard

>
> Richard. 
>
>>2019-12-11  Jakub Jelinek  <ja...@redhat.com>
>>
>>      PR target/92908
>>      * simplify-rtx.c (simplify_relational_operation): For vector cmp_mode
>>      and scalar mode, if simplify_relational_operation returned
>>      const_true_rtx, return a scalar bitmask of all ones.
>>      (simplify_const_relational_operation): Change VOID_mode in function
>>      comment to VOIDmode.
>>
>>      * gcc.target/i386/avx512bw-pr92908.c: New test.
>>
>>--- gcc/simplify-rtx.c.jj     2019-11-19 22:27:02.000058742 +0100
>>+++ gcc/simplify-rtx.c        2019-12-11 13:31:57.197809704 +0100
>>@@ -5037,6 +5037,23 @@ simplify_relational_operation (enum rtx_
>>        return NULL_RTX;
>> #endif
>>      }
>>+      if (VECTOR_MODE_P (cmp_mode)
>>+       && SCALAR_INT_MODE_P (mode)
>>+       && tem == const_true_rtx)
>>+     {
>>+       /* Vector comparisons that expect a scalar integral
>>+          bitmask.  For const0_rtx the result is already correct,
>>+          for const_true_rtx we need all bits set.  */
>>+       int n_elts;
>>+       scalar_int_mode smode = as_a <scalar_int_mode> (mode);
>>+       gcc_assert (GET_MODE_NUNITS (cmp_mode).is_constant (&n_elts)
>>+                   && GET_MODE_PRECISION (smode) <= n_elts);
>>+       if (GET_MODE_PRECISION (smode) == n_elts)
>>+         return constm1_rtx;
>>+       if (n_elts < HOST_BITS_PER_WIDE_INT)
>>+         return GEN_INT ((HOST_WIDE_INT_1U << n_elts) - 1);
>>+       return NULL_RTX;
>>+     }
>> 
>>       return tem;
>>     }
>>@@ -5383,7 +5400,7 @@ comparison_result (enum rtx_code code, i
>> }
>> 
>> /* Check if the given comparison (done in the given MODE) is actually
>>-   a tautology or a contradiction.  If the mode is VOID_mode, the
>>+   a tautology or a contradiction.  If the mode is VOIDmode, the
>>    comparison is done in "infinite precision".  If no simplification
>>    is possible, this function returns zero.  Otherwise, it returns
>>    either const_true_rtx or const0_rtx.  */
>>--- gcc/testsuite/gcc.target/i386/avx512bw-pr92908.c.jj       2019-12-11
>>14:24:12.083418762 +0100
>>+++ gcc/testsuite/gcc.target/i386/avx512bw-pr92908.c  2019-12-11
>>14:23:56.071665326 +0100
>>@@ -0,0 +1,21 @@
>>+/* PR target/92908 */
>>+/* { dg-do run } */
>>+/* { dg-options "-Og -fno-tree-fre -mavx512bw" } */
>>+/* { dg-require-effective-target avx512bw } */
>>+
>>+#define AVX512BW
>>+#include "avx512f-helper.h"
>>+
>>+typedef unsigned short V __attribute__ ((__vector_size__ (64)));
>>+
>>+V v;
>>+
>>+void
>>+TEST (void)
>>+{
>>+  int i;
>>+  v = (V) v == v;
>>+  for (i = 0; i < 32; i++)
>>+    if (v[i] != 0xffff)
>>+      abort ();
>>+}
>>
>>      Jakub

Reply via email to