RE: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
Committed, thanks Robin. Pan -Original Message- From: Gcc-patches On Behalf Of Robin Dapp via Gcc-patches Sent: Friday, September 15, 2023 11:44 PM To: 钟居哲 ; Jeff Law ; kito.cheng Cc: rdapp@gmail.com; gcc-patches ; kito.cheng Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] > You mean this patch is ok? I thought about it a bit more. From my point of view the patch is OK for now in order to get the bug out of the way. In the longer term I would really prefer a more "regular" solution (i.e. via hard_regno_mode_ok) and related. I can take care of that once I have a bit of time but for now let's go ahead. Regards Robin
Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> You mean this patch is ok? I thought about it a bit more. From my point of view the patch is OK for now in order to get the bug out of the way. In the longer term I would really prefer a more "regular" solution (i.e. via hard_regno_mode_ok) and related. I can take care of that once I have a bit of time but for now let's go ahead. Regards Robin
Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
You mean this patch is ok? juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-09-15 23:27 To: 钟居哲; kito.cheng CC: gcc-patches; kito.cheng; rdapp.gcc Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] On 9/14/23 16:26, 钟居哲 wrote: > I don't think it can fix the case when it is -march=rv64gc_zve32x > > > juzhe.zh...@rivai.ai > > *From:* Kito Cheng <mailto:kito.ch...@gmail.com> > *Date:* 2023-09-15 00:17 > *To:* Juzhe-Zhong <mailto:juzhe.zh...@rivai.ai> > *CC:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>; kito.cheng > <mailto:kito.ch...@sifive.com>; jeffreyalaw > <mailto:jeffreya...@gmail.com>; rdapp.gcc <mailto:rdapp....@gmail.com> > *Subject:* Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode > move[PR111391] > I am thinking what we are doing is something like we are allowing > scalar mode within the vector register, so...not sure should we try to > implement that within the mov pattern? > I guess we need some inputs from Jeff. It's advantageous if we can avoid it. It often gets quite ugly when you start allowing something like scalar modes in vector registers -- particularly if you support something other than simple moves. But we may end up needing to do that anyway to do something like supporting the sqrt & recip estimators in scalar FP modes, which can be very advantageous for benchmarks like nab. So my suggestion would be go ahead if it looks like it can really solve this problem -- knowing there will likely be a long tail of fallout. If it doesn't help pr111391, then let's defer until we really dive into 544.nab/644.nab and try to improve that sqrt (x) and 1/sqrt(x) sequence that shows up in there. jeff
Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
On 9/14/23 16:26, 钟居哲 wrote: I don't think it can fix the case when it is -march=rv64gc_zve32x juzhe.zh...@rivai.ai *From:* Kito Cheng <mailto:kito.ch...@gmail.com> *Date:* 2023-09-15 00:17 *To:* Juzhe-Zhong <mailto:juzhe.zh...@rivai.ai> *CC:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>; kito.cheng <mailto:kito.ch...@sifive.com>; jeffreyalaw <mailto:jeffreya...@gmail.com>; rdapp.gcc <mailto:rdapp....@gmail.com> *Subject:* Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] I am thinking what we are doing is something like we are allowing scalar mode within the vector register, so...not sure should we try to implement that within the mov pattern? I guess we need some inputs from Jeff. It's advantageous if we can avoid it. It often gets quite ugly when you start allowing something like scalar modes in vector registers -- particularly if you support something other than simple moves. But we may end up needing to do that anyway to do something like supporting the sqrt & recip estimators in scalar FP modes, which can be very advantageous for benchmarks like nab. So my suggestion would be go ahead if it looks like it can really solve this problem -- knowing there will likely be a long tail of fallout. If it doesn't help pr111391, then let's defer until we really dive into 544.nab/644.nab and try to improve that sqrt (x) and 1/sqrt(x) sequence that shows up in there. jeff
Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
>> Now, whether that's efficient (and desirable) is a separate issue and >> should probably be defined by register_move_costs as well as instruction >> costs. I wasn't actually aware of this call/argument optimization that >> uses vec_duplicate and I haven't checked what costing (if at all) it >> uses. This is patch is not the performance improve patch. It's a bug fix patch. I am not optimize the codegen. That's why I put it into move pattern to handle that statically. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-15 05:06 To: Kito Cheng; Juzhe-Zhong CC: rdapp.gcc; gcc-patches; kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] > I am thinking what we are doing is something like we are allowing > scalar mode within the vector register, so...not sure should we try to > implement that within the mov pattern? > > I guess we need some inputs from Jeff. Sorry for the late response. I have also been thinking about this and it feels a bit like a bandaid to me. Usually register-class moves like this are performed by reload (which consults register_move_costs among other things) and we are working around it. The situation is that we move a vec_duplicate of QImodes into a vector register. Then we want to use this as scalar call argument so we need to transfer it back to a DImode register. One maybe more typical solution would be to allow small VLS vector modes like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)). The next step would be to have a mov expander with target "r" constraint (and source "vr") that performs the actual move. This is where Juzhe's mov code could fit in (without the subreg handling). If I'm not mistaken vmv.x.s without slidedown should be sufficient for our case as we'd only want to use the whole thing when the full vector fits into a GPR. All that's missing is a (reinterpreting) vtype change to Pmode-sized elements before. I quickly hacked something together (without the proper mode change) and the resulting code looks like: vsetvli zero, 8, e8, ... vmv.v.x v1,a5 # missing vsetivli zero, 1, e64, ... or something vmv.x.s a0,v1 Now, whether that's efficient (and desirable) is a separate issue and should probably be defined by register_move_costs as well as instruction costs. I wasn't actually aware of this call/argument optimization that uses vec_duplicate and I haven't checked what costing (if at all) it uses. Regards Robin
Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
I don't think it can fix the case when it is -march=rv64gc_zve32x juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-09-15 00:17 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] I am thinking what we are doing is something like we are allowing scalar mode within the vector register, so...not sure should we try to implement that within the mov pattern? I guess we need some inputs from Jeff. e.g. diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 0ecda795b38..ffced41588d 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -7621,6 +7621,9 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode) } else if (V_REG_P (regno)) { + if (mode is scalar) + return true; + if (!riscv_v_ext_mode_p (mode)) return false; diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6d6a2b3748c..50bac39f125 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2035,8 +2035,8 @@ (define_insn "*movdi_32bit" (set_attr "ext" "base,base,base,base,d,d,d,d,d,vector")]) (define_insn "*movdi_64bit" - [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, *f,*f,*r,*f,*m,r") - (match_operand:DI 1 "move_operand" " r,T,m,rJ,*r*J,*m,*f,*f,*f,vp"))] + [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, *f,*f,*r,*f,*m,r,*vr,*r,*vr,*vr,*m") + (match_operand:DI 1 "move_operand" " r,T,m,rJ,*r*J,*m,*f,*f,*f,vp,vr,vr,r,m,vr"))] "TARGET_64BIT && (register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode))"
Re: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
>> All that's missing is a (reinterpreting) vtype change to Pmode-sized >> elements before. I quickly hacked something together (without the proper >> mode change) and the resulting code looks like: >> vsetvli zero, 8, e8, ... >> vmv.v.x v1,a5 >> # missing vsetivli zero, 1, e64, ... or something >> vmv.x.s a0,v1 This issue has been addressed by this patch: [PATCH V3] RISC-V: Fix ICE in get_avl_or_vl_reg (gnu.org) juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-15 05:06 To: Kito Cheng; Juzhe-Zhong CC: rdapp.gcc; gcc-patches; kito.cheng; jeffreyalaw Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] > I am thinking what we are doing is something like we are allowing > scalar mode within the vector register, so...not sure should we try to > implement that within the mov pattern? > > I guess we need some inputs from Jeff. Sorry for the late response. I have also been thinking about this and it feels a bit like a bandaid to me. Usually register-class moves like this are performed by reload (which consults register_move_costs among other things) and we are working around it. The situation is that we move a vec_duplicate of QImodes into a vector register. Then we want to use this as scalar call argument so we need to transfer it back to a DImode register. One maybe more typical solution would be to allow small VLS vector modes like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)). The next step would be to have a mov expander with target "r" constraint (and source "vr") that performs the actual move. This is where Juzhe's mov code could fit in (without the subreg handling). If I'm not mistaken vmv.x.s without slidedown should be sufficient for our case as we'd only want to use the whole thing when the full vector fits into a GPR. All that's missing is a (reinterpreting) vtype change to Pmode-sized elements before. I quickly hacked something together (without the proper mode change) and the resulting code looks like: vsetvli zero, 8, e8, ... vmv.v.x v1,a5 # missing vsetivli zero, 1, e64, ... or something vmv.x.s a0,v1 Now, whether that's efficient (and desirable) is a separate issue and should probably be defined by register_move_costs as well as instruction costs. I wasn't actually aware of this call/argument optimization that uses vec_duplicate and I haven't checked what costing (if at all) it uses. Regards Robin
Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> I am thinking what we are doing is something like we are allowing > scalar mode within the vector register, so...not sure should we try to > implement that within the mov pattern? > > I guess we need some inputs from Jeff. Sorry for the late response. I have also been thinking about this and it feels a bit like a bandaid to me. Usually register-class moves like this are performed by reload (which consults register_move_costs among other things) and we are working around it. The situation is that we move a vec_duplicate of QImodes into a vector register. Then we want to use this as scalar call argument so we need to transfer it back to a DImode register. One maybe more typical solution would be to allow small VLS vector modes like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)). The next step would be to have a mov expander with target "r" constraint (and source "vr") that performs the actual move. This is where Juzhe's mov code could fit in (without the subreg handling). If I'm not mistaken vmv.x.s without slidedown should be sufficient for our case as we'd only want to use the whole thing when the full vector fits into a GPR. All that's missing is a (reinterpreting) vtype change to Pmode-sized elements before. I quickly hacked something together (without the proper mode change) and the resulting code looks like: vsetvli zero, 8, e8, ... vmv.v.x v1,a5 # missing vsetivli zero, 1, e64, ... or something vmv.x.s a0,v1 Now, whether that's efficient (and desirable) is a separate issue and should probably be defined by register_move_costs as well as instruction costs. I wasn't actually aware of this call/argument optimization that uses vec_duplicate and I haven't checked what costing (if at all) it uses. Regards Robin
Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
I am thinking what we are doing is something like we are allowing scalar mode within the vector register, so...not sure should we try to implement that within the mov pattern? I guess we need some inputs from Jeff. e.g. diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 0ecda795b38..ffced41588d 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -7621,6 +7621,9 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode) } else if (V_REG_P (regno)) { + if (mode is scalar) + return true; + if (!riscv_v_ext_mode_p (mode)) return false; diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6d6a2b3748c..50bac39f125 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2035,8 +2035,8 @@ (define_insn "*movdi_32bit" (set_attr "ext" "base,base,base,base,d,d,d,d,d,vector")]) (define_insn "*movdi_64bit" - [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, *f,*f,*r,*f,*m,r") - (match_operand:DI 1 "move_operand" " r,T,m,rJ,*r*J,*m,*f,*f,*f,vp"))] + [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, *f,*f,*r,*f,*m,r,*vr,*r,*vr,*vr,*m") + (match_operand:DI 1 "move_operand" " r,T,m,rJ,*r*J,*m,*f,*f,*f,vp,vr,vr,r,m,vr"))] "TARGET_64BIT && (register_operand (operands[0], DImode) || reg_or_0_operand (operands[1], DImode))"