Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> Is this following reasonable ?
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..8d1cdad1607 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
> 5) elements from memory.
> +Bit @var{i} of the mask (operand 3) is set if element @var{i} of the result 
> should

Element @var{i} of the mask

> +be loaded from memory and clear if element @var{i} of the result should be 
> zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + 
> operand 5) elements of mask (operand 3) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> stored.

Element @var{i}

Otherwise looks OK

> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-03 15:40
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > The following is strided load/store doc:
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..4f0821a291d 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > 
> > +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > Pmode.
> > +operand 3 indicates stride operand is signed or unsigned number and 
> > operand 4 is mask operand.
>  
> we don't need

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi?Richi.
> 
> >> Element @var{i}.  I think we don't want to document 'undefined' but
> >> rather match mask_gather_load and say the result should be zero.
> 
> I am confused here,  will vectorizer depend on zero inactive value ?

I don't think it does, but why be inconsistent with gather and at
the same time document it behaves like gather?

> Since RVV load/store doesn't touch the inactive(tail or unmasked) elements, I 
> am wondering whether it will cause issues.

For load not zeroing will cause a false dependence on the previous value
of the register, so I doubt this is optimal behavior.  For stores the
inactive elements are of course not stored.

> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-03 15:40
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > The following is strided load/store doc:
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..4f0821a291d 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > 
> > +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > Pmode.
> > +operand 3 indicates stride operand is signed or unsigned number and 
> > operand 4 is mask operand.
>  
> we don't need operand 3
>  
> > +operand 5 is length operand and operand 6 is bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 2 as step.
> > +For each element index i load address is operand 1 + @var{i} * operand 2.
> > +operand 2 can be negative stride if operand 3 is 0.
> > +Similar to mask_len_load, the instruction loads at most (operand 5 + 
> > operand 6) elements from memory.
> > +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the 
> > result should
>  
> Element @var{i}.  I think we don't want to document 'undefined' but
> rather match mask_gather_load and say the result should be zero.
>  
> Similar adjustments below.
>  
> > +be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> > +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> > +
> >  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
> >  @item @samp{scatter_store@var{m}@var{n}}
> >  Store a vector of mode @var{m} into several distinct memory locations.
> > @@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 
> > 4) to memory.
> >  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> > stored.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > 
> > +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_store@var{m}}
> > +Store a vector of mode m into several distinct memory locations.
> > +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> > +operand 2 indicates stride operand is signed or unsigned number.
> > +Operand 3 is the vector of values that should be stored, which is of mode 
> > @var{m}.
> > +operand 4 is mask operand, operand 5 is length operand and operand 6 is 
> > bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_scatter_store@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 1 as step.
> > +For each element index i store address is operand 0 + @var{i} * operand 1.
> > +operand 1 can be negative stride if operand 2 is 0.
> > +Similar to mask_len_store, the instruction stores at most (operand 5 + 
> > operand 6) elements of mask (operand 4) to memory.
> > +Bit @var{i} of the mask is

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread juzhe.zh...@rivai.ai
Hi, Richi.

Is this following reasonable ?

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..8d1cdad1607 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Bit @var{i} of the mask (operand 3) is set if element @var{i} of the result 
should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-03 15:40
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> The following is strided load/store doc:
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..4f0821a291d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 indicates stride operand is signed or unsigned number and operand 
> 4 is mask operand.
 
we don't need operand 3
 
> +operand 5 is length operand and operand 6 is bias operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +operand 2 can be negative stride if operand 3 is 0.
> +Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
> 6) elements from memory.
> +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
> should
 
Element @var{i}. 

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread juzhe.zh...@rivai.ai
Hi,Richi.

>> Element @var{i}.  I think we don't want to document 'undefined' but
>> rather match mask_gather_load and say the result should be zero.

I am confused here,  will vectorizer depend on zero inactive value ?
Since RVV load/store doesn't touch the inactive(tail or unmasked) elements, I 
am wondering whether it will cause issues.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-03 15:40
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> The following is strided load/store doc:
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..4f0821a291d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 indicates stride operand is signed or unsigned number and operand 
> 4 is mask operand.
 
we don't need operand 3
 
> +operand 5 is length operand and operand 6 is bias operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +operand 2 can be negative stride if operand 3 is 0.
> +Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
> 6) elements from memory.
> +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
> should
 
Element @var{i}.  I think we don't want to document 'undefined' but
rather match mask_gather_load and say the result should be zero.
 
Similar adjustments below.
 
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +operand 2 indicates stride operand is signed or unsigned number.
> +Operand 3 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 4 is mask operand, operand 5 is length operand and operand 6 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +operand 1 can be negative stride if operand 2 is 0.
> +Similar to mask_len_store, the instruction stores at most (operand 5 + 
> operand 6) elements of mask (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> 
> Does it look reasonable ? Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:34
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
>  
> I don't think we need signed/unsig

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> The following is strided load/store doc:
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..4f0821a291d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 indicates stride operand is signed or unsigned number and operand 
> 4 is mask operand.

we don't need operand 3

> +operand 5 is length operand and operand 6 is bias operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +operand 2 can be negative stride if operand 3 is 0.
> +Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
> 6) elements from memory.
> +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
> should

Element @var{i}.  I think we don't want to document 'undefined' but
rather match mask_gather_load and say the result should be zero.

Similar adjustments below.

> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +operand 2 indicates stride operand is signed or unsigned number.
> +Operand 3 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 4 is mask operand, operand 5 is length operand and operand 6 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +operand 1 can be negative stride if operand 2 is 0.
> +Similar to mask_len_store, the instruction stores at most (operand 5 + 
> operand 6) elements of mask (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> 
> Does it look reasonable ? Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:34
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
>  
> I don't think we need signed/unsigned.  RTL expansion has the signedness
> of the offset argument there and can just extend to the appropriate mode
> to offset a pointer.
>  
> > And I wonder I should create the stride_type using size_type_node or 
> > ptrdiff_type_node ?
> > Which is preferrable ?
>  
> 'sizetype' - that's the type we require to be used for 
> the POINTER_PLUS_EXPR offset operand.
>  
>  
> > Thanks.
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-11-02 22:27
> > To: ???
> > C

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread juzhe.zh...@rivai.ai
Hi, Richi.

The following is strided load/store doc:

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..4f0821a291d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 indicates stride operand is signed or unsigned number and operand 4 
is mask operand.
+operand 5 is length operand and operand 6 is bias operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+operand 2 can be negative stride if operand 3 is 0.
+Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
6) elements from memory.
+Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+operand 2 indicates stride operand is signed or unsigned number.
+Operand 3 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 4 is mask operand, operand 5 is length operand and operand 6 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+operand 1 can be negative stride if operand 2 is 0.
+Similar to mask_len_store, the instruction stores at most (operand 5 + operand 
6) elements of mask (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,

Does it look reasonable ? Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 22:34
To: 钟居哲
CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Thu, 2 Nov 2023, ??? wrote:
 
> Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
 
I don't think we need signed/unsigned.  RTL expansion has the signedness
of the offset argument there and can just extend to the appropriate mode
to offset a pointer.
 
> And I wonder I should create the stride_type using size_type_node or 
> ptrdiff_type_node ?
> Which is preferrable ?
 
'sizetype' - that's the type we require to be used for 
the POINTER_PLUS_EXPR offset operand.
 
 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:27
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Hi, Richi.
> > 
> > >> Do we really need to have two modes for the optab though or could we
> > >> simply require the target to support arbitrary offset modes (give it
> > >> is implicitly constrained to ptr_mode for the base already)?  Or
> > >> properly extend/truncate the offset at expansion time, say to ptr_mode
> > >> or to the mode of sizetype.
> > 
> > For RVV, it's ok by default set stride type as ptr_mode/size_type by 
> > default.
> > Is it ok that I define strided loa

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread Richard Biener
On Thu, 2 Nov 2023, ??? wrote:

> Hi, Richi.
> 
> >> Do we really need to have two modes for the optab though or could we
> >> simply require the target to support arbitrary offset modes (give it
> >> is implicitly constrained to ptr_mode for the base already)?  Or
> >> properly extend/truncate the offset at expansion time, say to ptr_mode
> >> or to the mode of sizetype.
> 
> For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
> Is it ok that I define strided load/store as single mode optab and default 
> Pmode as stride operand?
> How about scale and signed/unsigned operand ?
> It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
> the stride arguments.
> But I think we can't remove signed/unsigned operand since for strided mode = 
> SI mode, the unsigned
> maximum stride = 2^31 wheras signed is 2 ^ 30.

On the GIMPLE side I think we want to have a sizetype operand and
indeed drop 'scale', the sizetype operand should be readily available.

> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 21:52
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> Subject: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
>  
> > As previous Richard's suggested, we should support strided load/store in
> > loop vectorizer instead hacking RISC-V backend.
> > 
> > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > 
> > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but 
> > with
> > changing vector offset into scalar stride.
>  
> I see that it follows gather/scatter.  I'll note that when introducing
> those we failed to add a specifier for TBAA and alignment info for the
> data access.  That means we have to use alias-set zero for the accesses
> (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> but TBAA info might have been the "easy" and obvious property to 
> preserve).  For alignment we either have to assume unaligned or reject
> vectorization of accesses that do not have their original scalar accesses
> naturally aligned (aligned according to their mode).  We don't seem
> to check that though.
>  
> It might be fine to go forward with this since gather/scatter are broken
> in a similar way.
>  
> Do we really need to have two modes for the optab though or could we
> simply require the target to support arbitrary offset modes (give it
> is implicitly constrained to ptr_mode for the base already)?  Or
> properly extend/truncate the offset at expansion time, say to ptr_mode
> or to the mode of sizetype.
>  
> Thanks,
> Richard.
> > We don't have strided_load/strided_store and 
> > mask_strided_load/mask_strided_store since
> > it't unlikely RVV will have such optabs and we can't add the patterns that 
> > we can't test them.
> >
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > * internal-fn.cc (internal_load_fn_p): Ditto.
> > (internal_strided_fn_p): Ditto.
> > (internal_fn_len_index): Ditto.
> > (internal_fn_mask_index): Ditto.
> > (internal_fn_stored_value_index): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > (MASK_LEN_STRIDED_STORE): Ditto.
> > * internal-fn.h (internal_strided_fn_p): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * optabs.def (OPTAB_CD): Ditto.
> > 
> > ---
> >  gcc/doc/md.texi | 51 +
> >  gcc/internal-fn.cc  | 44 ++
> >  gcc/internal-fn.def |  4 
> >  gcc/internal-fn.h   |  2 ++
> >  gcc/optabs.def  |  2 ++
> >  5 files changed, 103 insertions(+)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..5bac713a0dd 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >  
> > +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}@var{n}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > mode @var{n}.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 2 as step.
> > +For each element index i:
> > +
> > +@itemize @bullet
> > +@item
> > +extend the stride to address width, using zero
> > +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> > +@item

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread Richard Biener
On Thu, 2 Nov 2023, ??? wrote:

> Ok. So drop 'scale' and keep signed/unsigned argument, is that right?

I don't think we need signed/unsigned.  RTL expansion has the signedness
of the offset argument there and can just extend to the appropriate mode
to offset a pointer.

> And I wonder I should create the stride_type using size_type_node or 
> ptrdiff_type_node ?
> Which is preferrable ?

'sizetype' - that's the type we require to be used for 
the POINTER_PLUS_EXPR offset operand.


> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:27
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Hi, Richi.
> > 
> > >> Do we really need to have two modes for the optab though or could we
> > >> simply require the target to support arbitrary offset modes (give it
> > >> is implicitly constrained to ptr_mode for the base already)?  Or
> > >> properly extend/truncate the offset at expansion time, say to ptr_mode
> > >> or to the mode of sizetype.
> > 
> > For RVV, it's ok by default set stride type as ptr_mode/size_type by 
> > default.
> > Is it ok that I define strided load/store as single mode optab and default 
> > Pmode as stride operand?
> > How about scale and signed/unsigned operand ?
> > It seems scale operand can be removed ? Since we can pass DR_STEP directly 
> > to the stride arguments.
> > But I think we can't remove signed/unsigned operand since for strided mode 
> > = SI mode, the unsigned
> > maximum stride = 2^31 wheras signed is 2 ^ 30.
>  
> On the GIMPLE side I think we want to have a sizetype operand and
> indeed drop 'scale', the sizetype operand should be readily available.
>  
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-11-02 21:52
> > To: Juzhe-Zhong
> > CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> > Subject: Re: [PATCH V2] OPTABS/IFN: Add 
> > mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> > On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > As previous Richard's suggested, we should support strided load/store in
> > > loop vectorizer instead hacking RISC-V backend.
> > > 
> > > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > > 
> > > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but 
> > > with
> > > changing vector offset into scalar stride.
> >  
> > I see that it follows gather/scatter.  I'll note that when introducing
> > those we failed to add a specifier for TBAA and alignment info for the
> > data access.  That means we have to use alias-set zero for the accesses
> > (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> > but TBAA info might have been the "easy" and obvious property to 
> > preserve).  For alignment we either have to assume unaligned or reject
> > vectorization of accesses that do not have their original scalar accesses
> > naturally aligned (aligned according to their mode).  We don't seem
> > to check that though.
> >  
> > It might be fine to go forward with this since gather/scatter are broken
> > in a similar way.
> >  
> > Do we really need to have two modes for the optab though or could we
> > simply require the target to support arbitrary offset modes (give it
> > is implicitly constrained to ptr_mode for the base already)?  Or
> > properly extend/truncate the offset at expansion time, say to ptr_mode
> > or to the mode of sizetype.
> >  
> > Thanks,
> > Richard.
> > > We don't have strided_load/strided_store and 
> > > mask_strided_load/mask_strided_store since
> > > it't unlikely RVV will have such optabs and we can't add the patterns 
> > > that we can't test them.
> > >
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > > * internal-fn.cc (internal_load_fn_p): Ditto.
> > > (internal_strided_fn_p): Ditto.
> > > (internal_fn_len_index): Ditto.
> > > (internal_fn_mask_index): Ditto.
> > > (internal_fn_stored_value_index): Ditto.
> > > (internal_strided_fn_supported_p): Ditto.
> > > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > > (MASK_LEN_STRIDED_STORE): Ditto.
> > > * internal-fn.h (internal_strided_fn_p): Ditto.
> >

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread 钟居哲
Hi, Richi.

>> Do we really need to have two modes for the optab though or could we
>> simply require the target to support arbitrary offset modes (give it
>> is implicitly constrained to ptr_mode for the base already)?  Or
>> properly extend/truncate the offset at expansion time, say to ptr_mode
>> or to the mode of sizetype.

For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
Is it ok that I define strided load/store as single mode optab and default 
Pmode as stride operand?
How about scale and signed/unsigned operand ?
It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
the stride arguments.
But I think we can't remove signed/unsigned operand since for strided mode = SI 
mode, the unsigned
maximum stride = 2^31 wheras signed is 2 ^ 30.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 21:52
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
Subject: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
 
> As previous Richard's suggested, we should support strided load/store in
> loop vectorizer instead hacking RISC-V backend.
> 
> This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> 
> The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
> changing vector offset into scalar stride.
 
I see that it follows gather/scatter.  I'll note that when introducing
those we failed to add a specifier for TBAA and alignment info for the
data access.  That means we have to use alias-set zero for the accesses
(I see existing targets use UNSPECs with some not elaborate MEM anyway,
but TBAA info might have been the "easy" and obvious property to 
preserve).  For alignment we either have to assume unaligned or reject
vectorization of accesses that do not have their original scalar accesses
naturally aligned (aligned according to their mode).  We don't seem
to check that though.
 
It might be fine to go forward with this since gather/scatter are broken
in a similar way.
 
Do we really need to have two modes for the optab though or could we
simply require the target to support arbitrary offset modes (give it
is implicitly constrained to ptr_mode for the base already)?  Or
properly extend/truncate the offset at expansion time, say to ptr_mode
or to the mode of sizetype.
 
Thanks,
Richard.
> We don't have strided_load/strided_store and 
> mask_strided_load/mask_strided_store since
> it't unlikely RVV will have such optabs and we can't add the patterns that we 
> can't test them.
>
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> * internal-fn.cc (internal_load_fn_p): Ditto.
> (internal_strided_fn_p): Ditto.
> (internal_fn_len_index): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> (internal_strided_fn_supported_p): Ditto.
> * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> (MASK_LEN_STRIDED_STORE): Ditto.
> * internal-fn.h (internal_strided_fn_p): Ditto.
> (internal_strided_fn_supported_p): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 51 +
>  gcc/internal-fn.cc  | 44 ++
>  gcc/internal-fn.def |  4 
>  gcc/internal-fn.h   |  2 ++
>  gcc/optabs.def  |  2 ++
>  5 files changed, 103 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..5bac713a0dd 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>  
> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of mode 
> @var{n}.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i:
> +
> +@itemize @bullet
> +@item
> +extend the stride to address width, using zero
> +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> +@item
> +multiply the extended stride by operand 4;
> +@item
> +add the result to the base; and
> +@item
> +load the value at that address (operand 1 + @var{i} * multiplied and 
> extended stride) into element @var{i} of operand 0.
> +@end itemize
> +
> +Similar to mask_len_load, the instruction loads at most (operand 6 + operand 
> 7) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} 

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread 钟居哲
Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
And I wonder I should create the stride_type using size_type_node or 
ptrdiff_type_node ?
Which is preferrable ?

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 22:27
To: 钟居哲
CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Thu, 2 Nov 2023, ??? wrote:
 
> Hi, Richi.
> 
> >> Do we really need to have two modes for the optab though or could we
> >> simply require the target to support arbitrary offset modes (give it
> >> is implicitly constrained to ptr_mode for the base already)?  Or
> >> properly extend/truncate the offset at expansion time, say to ptr_mode
> >> or to the mode of sizetype.
> 
> For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
> Is it ok that I define strided load/store as single mode optab and default 
> Pmode as stride operand?
> How about scale and signed/unsigned operand ?
> It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
> the stride arguments.
> But I think we can't remove signed/unsigned operand since for strided mode = 
> SI mode, the unsigned
> maximum stride = 2^31 wheras signed is 2 ^ 30.
 
On the GIMPLE side I think we want to have a sizetype operand and
indeed drop 'scale', the sizetype operand should be readily available.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 21:52
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> Subject: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
>  
> > As previous Richard's suggested, we should support strided load/store in
> > loop vectorizer instead hacking RISC-V backend.
> > 
> > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > 
> > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but 
> > with
> > changing vector offset into scalar stride.
>  
> I see that it follows gather/scatter.  I'll note that when introducing
> those we failed to add a specifier for TBAA and alignment info for the
> data access.  That means we have to use alias-set zero for the accesses
> (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> but TBAA info might have been the "easy" and obvious property to 
> preserve).  For alignment we either have to assume unaligned or reject
> vectorization of accesses that do not have their original scalar accesses
> naturally aligned (aligned according to their mode).  We don't seem
> to check that though.
>  
> It might be fine to go forward with this since gather/scatter are broken
> in a similar way.
>  
> Do we really need to have two modes for the optab though or could we
> simply require the target to support arbitrary offset modes (give it
> is implicitly constrained to ptr_mode for the base already)?  Or
> properly extend/truncate the offset at expansion time, say to ptr_mode
> or to the mode of sizetype.
>  
> Thanks,
> Richard.
> > We don't have strided_load/strided_store and 
> > mask_strided_load/mask_strided_store since
> > it't unlikely RVV will have such optabs and we can't add the patterns that 
> > we can't test them.
> >
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > * internal-fn.cc (internal_load_fn_p): Ditto.
> > (internal_strided_fn_p): Ditto.
> > (internal_fn_len_index): Ditto.
> > (internal_fn_mask_index): Ditto.
> > (internal_fn_stored_value_index): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > (MASK_LEN_STRIDED_STORE): Ditto.
> > * internal-fn.h (internal_strided_fn_p): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * optabs.def (OPTAB_CD): Ditto.
> > 
> > ---
> >  gcc/doc/md.texi | 51 +
> >  gcc/internal-fn.cc  | 44 ++
> >  gcc/internal-fn.def |  4 
> >  gcc/internal-fn.h   |  2 ++
> >  gcc/optabs.def  |  2 ++
> >  5 files changed, 103 insertions(+)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..5bac713a0dd 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result s