On 10/30/23 01:25, Fei Gao wrote:
Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd

Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd

Co-authored-by: Xiao Zeng<zengx...@eswincomputing.com>

gcc/ChangeLog:

         * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
         (noce_try_cond_zero_arith): handler for condtional zero op
         (noce_process_if_block): add noce_try_cond_zero_arith with hook control

gcc/testsuite/ChangeLog:

         * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
  gcc/ifcvt.cc                                  | 112 +++++++++++++++
  .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
  2 files changed, 242 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..4f98c1c7bf9 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx 
z, rtx target)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx cond_op0 = XEXP (if_info->cond, 0);
+  rtx czero_cond
+    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
+  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
+  rtx set = gen_rtx_SET (target, if_then_else);
+
+  start_sequence ();
+  rtx_insn *insn = emit_insn (set);
+
+  if (recog_memoized (insn) >= 0)
+    {
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn (seq);
+
+      return target;
+    }
+
+  end_sequence ();
+  return NULL_RTX;
+}
So just a few notes to further illustrate why I'm currently looking to take the VRULL+Ventana implementation. The code above would be much better handled by just calling noce_emit_cmove. noce_emit_cmove will go through the conditional move expander. So any improvement we make in the expander "just work" when called from the if-converter.
+
  /* Try only simple constants and registers here.  More complex cases
     are handled in noce_try_cmove_arith after noce_try_store_flag_arith
     has had a go at it.  */
@@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
    return true;
  }
+/* Convert x = c ? y + z : y or x = c ? y : y + z. */
+
+static bool
+noce_try_cond_zero_arith (struct noce_if_info *if_info)
+{
+  rtx target;
+  rtx_insn *seq;
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx common = NULL_RTX;
+  enum rtx_code czero_code = UNKNOWN;
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx z = NULL_RTX;
+  rtx cond = if_info->cond;
+
+  if (!noce_simple_bbs (if_info))
+    return false;
[ ... ]
So the internal code we have does a bit of canonicalization before the optimizing transformations. In particular we may be presented with

(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows us to pick up more cases. (b != 0 ? b : a) gets similar handling.

As I mentioned earlier, the VRULL+Ventana code handles wrapping extensions & subregs. Our code also handles if-converting shifts/rotates.

Hopefully that explains a bit more why I think cleaning up the VRULL+Ventana code is a better choice.

jeff

Reply via email to