RE: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-16 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Friday, September 15, 2023 11:44 PM
To: 钟居哲 ; Jeff Law ; kito.cheng 

Cc: rdapp@gmail.com; gcc-patches ; kito.cheng 

Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

> You mean this patch is ok?

I thought about it a bit more.  From my point of view the patch is OK
for now in order to get the bug out of the way.

In the longer term I would really prefer a more "regular" solution
(i.e. via hard_regno_mode_ok) and related.  I can take care of that
once I have a bit of time but for now let's go ahead.

Regards
 Robin


Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-15 Thread Robin Dapp via Gcc-patches
> You mean this patch is ok?

I thought about it a bit more.  From my point of view the patch is OK
for now in order to get the bug out of the way.

In the longer term I would really prefer a more "regular" solution
(i.e. via hard_regno_mode_ok) and related.  I can take care of that
once I have a bit of time but for now let's go ahead.

Regards
 Robin


Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-15 Thread 钟居哲
You mean this patch is ok?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-09-15 23:27
To: 钟居哲; kito.cheng
CC: gcc-patches; kito.cheng; rdapp.gcc
Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
 
 
On 9/14/23 16:26, 钟居哲 wrote:
> I don't think it can fix the case when it is -march=rv64gc_zve32x
> 
> 
> juzhe.zh...@rivai.ai
> 
> *From:* Kito Cheng <mailto:kito.ch...@gmail.com>
> *Date:* 2023-09-15 00:17
> *To:* Juzhe-Zhong <mailto:juzhe.zh...@rivai.ai>
> *CC:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>; kito.cheng
> <mailto:kito.ch...@sifive.com>; jeffreyalaw
> <mailto:jeffreya...@gmail.com>; rdapp.gcc <mailto:rdapp....@gmail.com>
> *Subject:* Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode
> move[PR111391]
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
> I guess we need some inputs from Jeff.
It's advantageous if we can avoid it.  It often gets quite ugly when you 
start allowing something like scalar modes in vector registers -- 
particularly if you support something other than simple moves.
 
But we may end up needing to do that anyway to do something like 
supporting the sqrt & recip estimators in scalar FP modes, which can be 
very advantageous for benchmarks like nab.
 
So my suggestion would be go ahead if it looks like it can really solve 
this problem -- knowing there will likely be a long tail of fallout.  If 
it doesn't help pr111391, then let's defer until we really dive into 
544.nab/644.nab and try to improve that sqrt (x) and 1/sqrt(x) sequence 
that shows up in there.
 
jeff
 


Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-15 Thread Jeff Law via Gcc-patches




On 9/14/23 16:26, 钟居哲 wrote:

I don't think it can fix the case when it is -march=rv64gc_zve32x


juzhe.zh...@rivai.ai

*From:* Kito Cheng <mailto:kito.ch...@gmail.com>
*Date:* 2023-09-15 00:17
*To:* Juzhe-Zhong <mailto:juzhe.zh...@rivai.ai>
*CC:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>; kito.cheng
<mailto:kito.ch...@sifive.com>; jeffreyalaw
<mailto:jeffreya...@gmail.com>; rdapp.gcc <mailto:rdapp....@gmail.com>
    *Subject:* Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode
move[PR111391]
I am thinking what we are doing is something like we are allowing
scalar mode within the vector register, so...not sure should we try to
implement that within the mov pattern?
I guess we need some inputs from Jeff.
It's advantageous if we can avoid it.  It often gets quite ugly when you 
start allowing something like scalar modes in vector registers -- 
particularly if you support something other than simple moves.


But we may end up needing to do that anyway to do something like 
supporting the sqrt & recip estimators in scalar FP modes, which can be 
very advantageous for benchmarks like nab.


So my suggestion would be go ahead if it looks like it can really solve 
this problem -- knowing there will likely be a long tail of fallout.  If 
it doesn't help pr111391, then let's defer until we really dive into 
544.nab/644.nab and try to improve that sqrt (x) and 1/sqrt(x) sequence 
that shows up in there.


jeff


Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread 钟居哲

>> Now, whether that's efficient (and desirable) is a separate issue and
>> should probably be defined by register_move_costs as well as instruction
>> costs.  I wasn't actually aware of this call/argument optimization that
>> uses vec_duplicate and I haven't checked what costing (if at all) it
>> uses.

This is patch is not the performance improve patch. It's a bug fix patch.
I am not optimize the codegen. That's why I put it into move pattern to handle 
that statically.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-15 05:06
To: Kito Cheng; Juzhe-Zhong
CC: rdapp.gcc; gcc-patches; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
> 
> I guess we need some inputs from Jeff.
 
Sorry for the late response.  I have also been thinking about this and
it feels a bit like a bandaid to me.  Usually register-class moves like
this are performed by reload (which consults register_move_costs among
other things) and we are working around it.
 
The situation is that we move a vec_duplicate of QImodes into a vector
register.  Then we want to use this as scalar call argument so we need
to transfer it back to a DImode register.
 
One maybe more typical solution would be to allow small VLS vector modes
like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have
a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)).
 
The next step would be to have a mov expander with target "r"
constraint (and source "vr") that performs the actual move.  This is
where Juzhe's mov code could fit in (without the subreg handling).
If I'm not mistaken vmv.x.s without slidedown should be sufficient for
our case as we'd only want to use the whole thing when the full vector
fits into a GPR. 
 
All that's missing is a (reinterpreting) vtype change to Pmode-sized
elements before. I quickly hacked something together (without the proper
mode change) and the resulting code looks like:
 
vsetvli zero, 8, e8, ...
vmv.v.x v1,a5
# missing vsetivli zero, 1, e64, ... or something 
vmv.x.s a0,v1
 
Now, whether that's efficient (and desirable) is a separate issue and
should probably be defined by register_move_costs as well as instruction
costs.  I wasn't actually aware of this call/argument optimization that
uses vec_duplicate and I haven't checked what costing (if at all) it
uses.
 
Regards
Robin
 


Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread 钟居哲
I don't think it can fix the case when it is -march=rv64gc_zve32x



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-09-15 00:17
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
I am thinking what we are doing is something like we are allowing
scalar mode within the vector register, so...not sure should we try to
implement that within the mov pattern?
 
I guess we need some inputs from Jeff.
 
 
e.g.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0ecda795b38..ffced41588d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7621,6 +7621,9 @@ riscv_hard_regno_mode_ok (unsigned int regno,
machine_mode mode)
}
  else if (V_REG_P (regno))
{
+  if (mode is scalar)
+   return true;
+
  if (!riscv_v_ext_mode_p (mode))
   return false;
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6d6a2b3748c..50bac39f125 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2035,8 +2035,8 @@ (define_insn "*movdi_32bit"
   (set_attr "ext" "base,base,base,base,d,d,d,d,d,vector")])
 
(define_insn "*movdi_64bit"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m,
*f,*f,*r,*f,*m,r")
-   (match_operand:DI 1 "move_operand" "
r,T,m,rJ,*r*J,*m,*f,*f,*f,vp"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m,
*f,*f,*r,*f,*m,r,*vr,*r,*vr,*vr,*m")
+   (match_operand:DI 1 "move_operand" "
r,T,m,rJ,*r*J,*m,*f,*f,*f,vp,vr,vr,r,m,vr"))]
  "TARGET_64BIT
   && (register_operand (operands[0], DImode)
   || reg_or_0_operand (operands[1], DImode))"
 


Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread 钟居哲
>> All that's missing is a (reinterpreting) vtype change to Pmode-sized
>> elements before. I quickly hacked something together (without the proper
>> mode change) and the resulting code looks like:
 
>> vsetvli zero, 8, e8, ...
>> vmv.v.x v1,a5
>> # missing vsetivli zero, 1, e64, ... or something
>> vmv.x.s a0,v1

This issue has been addressed by this patch:
[PATCH V3] RISC-V: Fix ICE in get_avl_or_vl_reg (gnu.org)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-15 05:06
To: Kito Cheng; Juzhe-Zhong
CC: rdapp.gcc; gcc-patches; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
> 
> I guess we need some inputs from Jeff.
 
Sorry for the late response.  I have also been thinking about this and
it feels a bit like a bandaid to me.  Usually register-class moves like
this are performed by reload (which consults register_move_costs among
other things) and we are working around it.
 
The situation is that we move a vec_duplicate of QImodes into a vector
register.  Then we want to use this as scalar call argument so we need
to transfer it back to a DImode register.
 
One maybe more typical solution would be to allow small VLS vector modes
like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have
a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)).
 
The next step would be to have a mov expander with target "r"
constraint (and source "vr") that performs the actual move.  This is
where Juzhe's mov code could fit in (without the subreg handling).
If I'm not mistaken vmv.x.s without slidedown should be sufficient for
our case as we'd only want to use the whole thing when the full vector
fits into a GPR. 
 
All that's missing is a (reinterpreting) vtype change to Pmode-sized
elements before. I quickly hacked something together (without the proper
mode change) and the resulting code looks like:
 
vsetvli zero, 8, e8, ...
vmv.v.x v1,a5
# missing vsetivli zero, 1, e64, ... or something 
vmv.x.s a0,v1
 
Now, whether that's efficient (and desirable) is a separate issue and
should probably be defined by register_move_costs as well as instruction
costs.  I wasn't actually aware of this call/argument optimization that
uses vec_duplicate and I haven't checked what costing (if at all) it
uses.
 
Regards
Robin
 


Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread Robin Dapp via Gcc-patches
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
> 
> I guess we need some inputs from Jeff.

Sorry for the late response.  I have also been thinking about this and
it feels a bit like a bandaid to me.  Usually register-class moves like
this are performed by reload (which consults register_move_costs among
other things) and we are working around it.

The situation is that we move a vec_duplicate of QImodes into a vector
register.  Then we want to use this as scalar call argument so we need
to transfer it back to a DImode register.

One maybe more typical solution would be to allow small VLS vector modes
like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have
a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)).

The next step would be to have a mov expander with target "r"
constraint (and source "vr") that performs the actual move.  This is
where Juzhe's mov code could fit in (without the subreg handling).
If I'm not mistaken vmv.x.s without slidedown should be sufficient for
our case as we'd only want to use the whole thing when the full vector
fits into a GPR. 

All that's missing is a (reinterpreting) vtype change to Pmode-sized
elements before. I quickly hacked something together (without the proper
mode change) and the resulting code looks like:

vsetvli zero, 8, e8, ...
vmv.v.x v1,a5
# missing vsetivli zero, 1, e64, ... or something 
vmv.x.s a0,v1

Now, whether that's efficient (and desirable) is a separate issue and
should probably be defined by register_move_costs as well as instruction
costs.  I wasn't actually aware of this call/argument optimization that
uses vec_duplicate and I haven't checked what costing (if at all) it
uses.

Regards
 Robin


Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-14 Thread Kito Cheng via Gcc-patches
I am thinking what we are doing is something like we are allowing
scalar mode within the vector register, so...not sure should we try to
implement that within the mov pattern?

I guess we need some inputs from Jeff.


e.g.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0ecda795b38..ffced41588d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7621,6 +7621,9 @@ riscv_hard_regno_mode_ok (unsigned int regno,
machine_mode mode)
}
  else if (V_REG_P (regno))
{
+  if (mode is scalar)
+   return true;
+
  if (!riscv_v_ext_mode_p (mode))
   return false;

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6d6a2b3748c..50bac39f125 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2035,8 +2035,8 @@ (define_insn "*movdi_32bit"
   (set_attr "ext" "base,base,base,base,d,d,d,d,d,vector")])

(define_insn "*movdi_64bit"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m,
*f,*f,*r,*f,*m,r")
-   (match_operand:DI 1 "move_operand" "
r,T,m,rJ,*r*J,*m,*f,*f,*f,vp"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m,
*f,*f,*r,*f,*m,r,*vr,*r,*vr,*vr,*m")
+   (match_operand:DI 1 "move_operand" "
r,T,m,rJ,*r*J,*m,*f,*f,*f,vp,vr,vr,r,m,vr"))]
  "TARGET_64BIT
   && (register_operand (operands[0], DImode)
   || reg_or_0_operand (operands[1], DImode))"