Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-18 Thread Robin Dapp via Gcc-patches
Hi Lehua,

> +(define_expand "vcond_mask_"
> +  [(set (match_operand:V_VLS 0 "register_operand")
> +(if_then_else:V_VLS
> +  (match_operand: 3 "register_operand")
> +  (match_operand:V_VLS 1 "nonmemory_operand")
> +  (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
> +  "TARGET_VECTOR"

Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?

> +  {
> +if (satisfies_constraint_Wc0 (operands[2]))
> +  {
> +rtx reg = gen_reg_rtx (mode);
> +emit_insn (gen_vec_duplicate_const_0 (reg, operands[2]));

Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between?  And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do?  Probably initial values other than 0 don't work out of the box?

In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.

Regards
 Robin


Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-18 Thread Lehua Ding

Hi Robin,


+(define_expand "vcond_mask_"
+  [(set (match_operand:V_VLS 0 "register_operand")
+(if_then_else:V_VLS
+  (match_operand: 3 "register_operand")
+  (match_operand:V_VLS 1 "nonmemory_operand")
+  (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
+  "TARGET_VECTOR"


Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?


Are you talking about why operand 2 doesn't use nonmemory_operand 
predicate? If so, I think this is because our vmerge.v[vxi]m insns only 
supports that operand 1 is a scalar and operand 2 must be a vector register.





+  {
+if (satisfies_constraint_Wc0 (operands[2]))
+  {
+rtx reg = gen_reg_rtx (mode);
+emit_insn (gen_vec_duplicate_const_0 (reg, operands[2]));


Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between?  


Currently, the move_insn of moving vector const 0 to a vector register 
will produce the bellow rtl from vector.md, I hope keep the simple 
pattern before split1 pass:


 (set (reg:RVVM2HI 149)
 (if_then_else:RVVM2HI (unspec:RVVMF8BI [
 (const_vector:RVVMF8BI repeat [
 (const_int 1 [0x1])
 ])
 (reg:DI 146)
 (const_int 2 [0x2]) repeated x2
 (const_int 1 [0x1])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (const_vector:RVVM2HI repeat [
 (const_int 0 [0])
 ])
 (unspec:RVVM2HI [
 (reg:SI 0 zero)
 ] UNSPEC_VUNDEF)))


And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do?  Probably initial values other than 0 don't work out of the box?


Yes, the name vec_duplicate_const_0 is missleading, maybe change it to 
mov_vec_const_0 is better.



In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.
Yes, I should have described the design in more detail. Before this 
patch, if I wanted to implement the current combine optimization, I 
would need to create a combine pattern that combines the following three 
instructions:


   (set (reg:RVVM2HI 149)
 (if_then_else:RVVM2HI (unspec:RVVMF8BI [
 (const_vector:RVVMF8BI repeat [
 (const_int 1 [0x1])
 ])
 (reg:DI 146)
 (const_int 2 [0x2]) repeated x2
 (const_int 1 [0x1])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (const_vector:RVVM2HI repeat [
 (const_int 0 [0])
 ])
 (unspec:RVVM2HI [
 (reg:SI 0 zero)
 ] UNSPEC_VUNDEF)))

   (set (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
 (if_then_else:RVVM2HI (reg:RVVMF8BI 135 [ mask__18.11 ])
 (sign_extend:RVVM2HI (reg:RVVM1QI 136 [ vect__6.14 ]))
 (reg:RVVM2HI 149)))

   (set (reg:HI 151)
 (unspec:HI [
 (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
 ] UNSPEC_REDUC_SUM))

into one insn:
   (set (reg:SF 139 [  ])
(unspec:SF [
 (if_then_else:RVVM2SF (reg:RVVMF16BI 135 [ mask__11.32 ])
 (float_extend:RVVM2SF (reg:RVVM1HF 136 [ vect__7.35 ]))
 (if_then_else:RVVM2SF (unspec:RVVMF16BI [
 (const_vector:RVVMF16BI repeat [
 (const_int 1 [0x1])
 ])
 (reg:DI 143)
 (const_int 2 [0x2]) repeated x2
 (const_int 1 [0x1])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (const_vector:RVVM2SF repeat [
 (const_double:SF 0.0 [0x0.0p+0])
 ])
 (unspec:RVVM2SF [
 (reg:SI 0 zero)
 ] UNSPEC_VUNDEF)))
 ] UNSPEC_REDUC_SUM_UNORDERED))

But I think the combine pattern is ugly because the 
move_vector_const_0_to_vector_registers has many extra operands we don't 
care about. So I want to introduce a simple pattern of it. I have two 
options here, one is to modify the generic mov pattern, changing 
define_expand to define_insn_and_split to keep pattern simple. The 
reason why I did not choose this one is that mov pattern uses a lot of 
places, including reload pass. The impact is relatively big.


The other option (this patch chooses) is to s

Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-19 Thread Robin Dapp
Hi Lehua,

>> Would it hurt to allow any nonmemory operand here and just force the
>> "unsupported" constants into a register?
> 
> Are you talking about why operand 2 doesn't use nonmemory_operand
> predicate? If so, I think this is because our vmerge.v[vxi]m insns
> only supports that operand 1 is a scalar and operand 2 must be a
> vector register.

My question was rather:

Why doesn't something like

(define_insn_and_split "vcond_mask_"
  [(set (match_operand:V_VLS 0 "register_operand")
(if_then_else:V_VLS
  (match_operand: 3 "register_operand")
  (match_operand:V_VLS 1 "nonmemory_operand")
  (match_operand:V_VLS 2 "nonmemory_operand")))]
  "TARGET_VECTOR && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(const_int 0)]
  {
/* The order of vcond_mask is opposite to pred_merge.  */
if (REG_P (operands[2]))
  operands[2] = force_reg (mode, operands[2]);
std::swap (operands[1], operands[2]);
riscv_vector::emit_vlmax_insn (code_for_pred_merge (mode),
   riscv_vector::MERGE_OP, operands);
DONE;
  }
  [(set_attr "type" "vector")]
)

suffice?  You could disallow operands[2] != 0 if needed.

Regards
 Robin



Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-19 Thread Lehua Ding

Hi Robin,


Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?


Are you talking about why operand 2 doesn't use nonmemory_operand
predicate? If so, I think this is because our vmerge.v[vxi]m insns
only supports that operand 1 is a scalar and operand 2 must be a
vector register.


My question was rather:

Why doesn't something like

(define_insn_and_split "vcond_mask_"
   [(set (match_operand:V_VLS 0 "register_operand")
 (if_then_else:V_VLS
   (match_operand: 3 "register_operand")
   (match_operand:V_VLS 1 "nonmemory_operand")
   (match_operand:V_VLS 2 "nonmemory_operand")))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
   [(const_int 0)]
   {
 /* The order of vcond_mask is opposite to pred_merge.  */
 if (REG_P (operands[2]))
   operands[2] = force_reg (mode, operands[2]);
 std::swap (operands[1], operands[2]);
 riscv_vector::emit_vlmax_insn (code_for_pred_merge (mode),
riscv_vector::MERGE_OP, operands);
 DONE;
   }
   [(set_attr "type" "vector")]
)

suffice?  You could disallow operands[2] != 0 if needed.


I think I understand what you're saying. The reason for not doing this 
is because simply allowing the operand 2 of vcond_mask to be vec_const_0 
would cause the other combine patterns to fail because they require the 
operand 2 of vcond_mask to be a register. Like the following existing 
combine patterns (operand 2 as the instruction merge operand, is not 
allowed to be non-register):


(define_insn_and_split "*cond_"
  [(set (match_operand:VF 0 "register_operand")
 (if_then_else:VF
   (match_operand: 1 "register_operand")
   (any_float_unop:VF
 (match_operand:VF 2 "register_operand"))
   (match_operand:VF 3 "register_operand")))]
  "TARGET_VECTOR && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(const_int 0)]
{
  insn_code icode = code_for_pred (, mode);
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
  riscv_vector::expand_cond_len_unop (icode, ops);
  DONE;
}
[(set_attr "type" "vector")])

My current method is still to keep the operand 2 of vcond_mask as a 
register, but the pattern of mov_vec_const_0 is simplified, so that the 
corresponding combine pattern can be more simple. That's the only reason 
I split the vcond_mask into three patterns.


--
Best,
Lehua


Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-19 Thread Robin Dapp
Hi Lehua,

thanks for the explanation.

> My current method is still to keep the operand 2 of vcond_mask as a
> register, but the pattern of mov_vec_const_0 is simplified, so that
> the corresponding combine pattern can be more simple. That's the only
> reason I split the vcond_mask into three patterns.

My "problem" with the separate split it that it really sticks out
and everybody seeing it would wonder why we need it.  It's not that
bad of course but it appears as if we messed up somewhere else. 

I checked and I don't see additional FAILs with the vmask pattern
that additionally allows a const0 operand (that is forced into a register)
and a force_reg in abs:VF.

Would you mind re-checking if we can avoid the extra
"vec_duplicate_const_0" by changing the other affected patterns
in a similar manner?  I really didn't verify in-depth so if we needed
to add a force_reg to every pattern we might need to reconsider.
Still, I'd be unsure if I preferred the "vec_dup_const_0" over
additional force_regs ;)

Regards
 Robin


Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-19 Thread Lehua Ding




On 2023/9/20 6:02, Robin Dapp wrote:

Hi Lehua,

thanks for the explanation.


My current method is still to keep the operand 2 of vcond_mask as a
register, but the pattern of mov_vec_const_0 is simplified, so that
the corresponding combine pattern can be more simple. That's the only
reason I split the vcond_mask into three patterns.


My "problem" with the separate split it that it really sticks out
and everybody seeing it would wonder why we need it.  It's not that
bad of course but it appears as if we messed up somewhere else.


Can not agree more.


I checked and I don't see additional FAILs with the vmask pattern
that additionally allows a const0 operand (that is forced into a register)
and a force_reg in abs:VF.

Would you mind re-checking if we can avoid the extra
"vec_duplicate_const_0" by changing the other affected patterns
in a similar manner?  I really didn't verify in-depth so if we needed
to add a force_reg to every pattern we might need to reconsider.
Still, I'd be unsure if I preferred the "vec_dup_const_0" over
additional force_regs ;)


I think that's a little weird, too. I prefer to add a single 
define_insn_and_split move_const_0 pattern like following diff, How do 
you feel about that?


diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f4dab9fceb8..c0ab96ae8ab 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
   rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx 
(mode);

   /* Element in range -16 ~ 15 integer or 0.0 floating-point,
 we use vmv.v.i instruction.  */
-  if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
+  /* For const int or float 0, we keep the simple pattern before split1
+pass. */
+  if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
+   emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
+  else if (satisfies_constraint_vi (src))
{
  rtx ops[] = {tmp, src};
  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f66ffebba24..b4973125d04 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1640,6 +1640,19 @@
   "TARGET_VECTOR"
   {})

+(define_insn_and_split "@mov_vec_const_0"
+  [(set (match_operand:V_VLS 0 "register_operand")
+(match_operand:V_VLS 1 "vector_const_0_operand"))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+emit_move_insn (operands[0], operands[1]);
+DONE;
+  }
+  [(set_attr "type" "vector")])
+
 ;; vle.v/vse.v,vmv.v.v
 (define_insn_and_split "*pred_mov"
   [(set (match_operand:V_VLS 0 "nonimmediate_operand""=vr, 
   vr,vd, m,vr,vr")


--
Best,
Lehua



Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum

2023-09-20 Thread Lehua Ding

Hi Robin,

I have posted a V2 patch to implement the method I mentioned. I wonder 
if that makes it a little easier to understand?


V2 patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630985.html


On 2023/9/20 12:02, Lehua Ding wrote:



On 2023/9/20 6:02, Robin Dapp wrote:

Hi Lehua,

thanks for the explanation.


My current method is still to keep the operand 2 of vcond_mask as a
register, but the pattern of mov_vec_const_0 is simplified, so that
the corresponding combine pattern can be more simple. That's the only
reason I split the vcond_mask into three patterns.


My "problem" with the separate split it that it really sticks out
and everybody seeing it would wonder why we need it.  It's not that
bad of course but it appears as if we messed up somewhere else.


Can not agree more.


I checked and I don't see additional FAILs with the vmask pattern
that additionally allows a const0 operand (that is forced into a 
register)

and a force_reg in abs:VF.

Would you mind re-checking if we can avoid the extra
"vec_duplicate_const_0" by changing the other affected patterns
in a similar manner?  I really didn't verify in-depth so if we needed
to add a force_reg to every pattern we might need to reconsider.
Still, I'd be unsure if I preferred the "vec_dup_const_0" over
additional force_regs ;)


I think that's a little weird, too. I prefer to add a single 
define_insn_and_split move_const_0 pattern like following diff, How do 
you feel about that?


diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f4dab9fceb8..c0ab96ae8ab 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
    rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx 
(mode);

    /* Element in range -16 ~ 15 integer or 0.0 floating-point,
   we use vmv.v.i instruction.  */
-  if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
+  /* For const int or float 0, we keep the simple pattern before 
split1

+ pass. */
+  if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
+    emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
+  else if (satisfies_constraint_vi (src))
  {
    rtx ops[] = {tmp, src};
    emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f66ffebba24..b4973125d04 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1640,6 +1640,19 @@
    "TARGET_VECTOR"
    {})

+(define_insn_and_split "@mov_vec_const_0"
+  [(set (match_operand:V_VLS 0 "register_operand")
+    (match_operand:V_VLS 1 "vector_const_0_operand"))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    emit_move_insn (operands[0], operands[1]);
+    DONE;
+  }
+  [(set_attr "type" "vector")])
+
  ;; vle.v/vse.v,vmv.v.v
  (define_insn_and_split "*pred_mov"
    [(set (match_operand:V_VLS 0 "nonimmediate_operand"    "=vr, 
    vr,    vd, m,    vr,    vr")




--
Best,
Lehua