[PATCH] vect: more oversized bitmask fixups

Andrew Stubbs Thu, 21 Mar 2024 07:22:44 -0700

My previous patch to fix this problem with xor was rejected because we
want to fix these issues only at the point of use.  That patch produced
slightly better code, in this example, but this works too....


These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps.

OK for mainline?

Andrew

gcc/ChangeLog:

        * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
        bitmaps.
        * internal-fn.cc (expand_fn_using_insn): Likewise.
        (add_mask_and_len_args): Likewise.
---
 gcc/dojump.cc      | 16 ++++++++++++++++
 gcc/internal-fn.cc | 26 ++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..8df86957e83 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code 
code, int unsignedp,
            }
        }
 
+      if (val
+         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+         && SCALAR_INT_MODE_P (mode))
+       {
+         auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+         if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+           {
+             op0 = expand_binop (mode, and_optab, op0,
+                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+                                 NULL_RTX, true, OPTAB_WIDEN);
+             op1 = expand_binop (mode, and_optab, op1,
+                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+                                 NULL_RTX, true, OPTAB_WIDEN);
+           }
+       }
+
       emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
                               if_true_label, prob);
     }
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
               && SSA_NAME_IS_DEFAULT_DEF (rhs)
               && VAR_P (SSA_NAME_VAR (rhs)))
        create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
+      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+              && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+              && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+                           TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+       {
+         /* Ensure that the vector bitmasks do not have excess bits.  */
+         int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+         rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+                                 NULL_RTX, true, OPTAB_WIDEN);
+         create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
+       }
       else
        create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
       opno += 1;
@@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int 
opno, gcall *stmt)
     {
       tree mask = gimple_call_arg (stmt, mask_index);
       rtx mask_rtx = expand_normal (mask);
+
+      tree mask_type = TREE_TYPE (mask);
+      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+         && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
+         && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
+                      TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
+       {
+         /* Ensure that the vector bitmasks do not have excess bits.  */
+         int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
+         mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
+                                  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+                                  NULL_RTX, true, OPTAB_WIDEN);
+       }
+
       create_input_operand (&ops[opno++], mask_rtx,
                            TYPE_MODE (TREE_TYPE (mask)));
     }
-- 
2.41.0

[PATCH] vect: more oversized bitmask fixups

Reply via email to