Re: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread 钟居哲
>> The use_real_merge just appeared odd to me here because there is
>> nothing to merge.  But in the end it's just to omit the vundef operand
>> so good for now.  There is an increasing number of opportunities to
>> refactor in riscv-v.cc, though ;)

I think we can change use_real_merge into use_dummy_merge?
When it's true then add undef merge :

if (!m_use_real_merge_p)
  add_vundef_operand ();

change it into:

if (m_use_dummy_merge_p)
  add_vundef_operand ();

Then we can avoid the confusion.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-24 17:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
Hi Juzhe,
 
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
 
Mhm, where is this sext coming from?  Thought I had this covered with
the autovec-opt pattern but apparently not.  I'll take that, nothing
related to this patch.
 
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
>{
>  /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax.  */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
 
Why is that necessary?  Just for the popcount I presume?
Can't we rather have a new case for a scalar destination?  I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
 
>  
> +/* Emit vcpop.m instruction.  */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = GET_MODE (ops[1]);
> +  insn_expander e (RVV_CPOP,
> +   /* HAS_DEST_P */ true,
> +   /* FULLY_UNMASKED_P */ true,
> +   /* USE_REAL_MERGE_P */ true,
> +   /* HAS_AVL_P */ true,
> +   /* VLMAX_P */ len ? false : true,
> +   dest_mode, mask_mode);
> +
> +  e.set_vl (len);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
 
The use_real_merge just appeared odd to me here because there is
nothing to merge.  But in the end it's just to omit the vundef operand
so good for now.  There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
 
The rest looks good to me.  Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
 
The rest looks good to me.  At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine.  Might want to use a conditional select in the
future but actually not that important. 
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread 钟居哲

>> Why is that necessary?  Just for the popcount I presume?
>> Can't we rather have a new case for a scalar destination?  I find
>> the code a bit misleading now as we check m_dest_mode and then not
>> use it.

I am gonna fix it in V2.

>> The rest looks good to me.  Note that my machine crashed when
>> compiling the extract_last-14.c because it used up all my RAM.
>> The vsetvl "refactor" phase 3 patch helped, though.
>> We'd need to have this patch depend on the other one then.

Yes. The refactor patch fixed potential bugs. I will commit that tomorrow 
when kito no more comments.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-24 17:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
Hi Juzhe,
 
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
 
Mhm, where is this sext coming from?  Thought I had this covered with
the autovec-opt pattern but apparently not.  I'll take that, nothing
related to this patch.
 
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
>{
>  /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax.  */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
 
Why is that necessary?  Just for the popcount I presume?
Can't we rather have a new case for a scalar destination?  I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
 
>  
> +/* Emit vcpop.m instruction.  */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = GET_MODE (ops[1]);
> +  insn_expander e (RVV_CPOP,
> +   /* HAS_DEST_P */ true,
> +   /* FULLY_UNMASKED_P */ true,
> +   /* USE_REAL_MERGE_P */ true,
> +   /* HAS_AVL_P */ true,
> +   /* VLMAX_P */ len ? false : true,
> +   dest_mode, mask_mode);
> +
> +  e.set_vl (len);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
 
The use_real_merge just appeared odd to me here because there is
nothing to merge.  But in the end it's just to omit the vundef operand
so good for now.  There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
 
The rest looks good to me.  Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
 
The rest looks good to me.  At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine.  Might want to use a conditional select in the
future but actually not that important. 
 
Regards
Robin
 


Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

>   vcpop.m a5,v0
>   beq a5,zero,.L3
>   addia5,a5,-1
>   vsetvli a4,zero,e32,m1,ta,ma
>   vcompress.vmv2,v3,v0
>   vslidedown.vx   v2,v2,a5
>   vmv.x.s a0,v2
> .L3:
>   sext.w  a0,a0

Mhm, where is this sext coming from?  Thought I had this covered with
the autovec-opt pattern but apparently not.  I'll take that, nothing
related to this patch.

> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
> {
>   /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
>  the vsetvli to obtain the value of vlmax.  */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);

Why is that necessary?  Just for the popcount I presume?
Can't we rather have a new case for a scalar destination?  I find
the code a bit misleading now as we check m_dest_mode and then not
use it.

>  
> +/* Emit vcpop.m instruction.  */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = GET_MODE (ops[1]);
> +  insn_expander e (RVV_CPOP,
> +   /* HAS_DEST_P */ true,
> +   /* FULLY_UNMASKED_P */ true,
> +   /* USE_REAL_MERGE_P */ true,
> +   /* HAS_AVL_P */ true,
> +   /* VLMAX_P */ len ? false : true,
> +   dest_mode, mask_mode);
> +
> +  e.set_vl (len);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}

The use_real_merge just appeared odd to me here because there is
nothing to merge.  But in the end it's just to omit the vundef operand
so good for now.  There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)

The rest looks good to me.  Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.

The rest looks good to me.  At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine.  Might want to use a conditional select in the
future but actually not that important. 

Regards
 Robin


[PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-23 Thread Juzhe-Zhong
Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
if (a[i] < min_v)
  last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li  a5,32
vmv.v.x v8,a1
vl8re32.v   v0,0(a0)
vid.v   v16
vmslt.vvv0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addia5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vmv8,v16,v0
vslidedown.vx   v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li  a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrra6,vlenb
mv  a2,a0
li  a3,32
li  a0,66
srlia6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v   v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
sllia1,a5,2
vmv.v.x v2,a6
vmslt.vvv0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addia5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vmv2,v3,v0
vslidedown.vx   v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w  a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the 
VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

---
 gcc/config/riscv/autovec.md   |  24 
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   | 115 +-
 gcc/config/riscv/vector.md|   2 +-
 .../riscv/rvv/autovec/reduc/extract_last-1.c  |  20 +++
 .../riscv/rvv/autovec/reduc/extract_last-10.c |   6 +
 .../riscv/rvv/autovec/reduc/extract_last-11.c |  24 
 .../riscv/rvv/autovec/reduc/extract_last-12.c |   6 +
 .../riscv/rvv/autovec/reduc/extract_last-13.c |   7 ++