Re: [PATCH] RISC-V: Fix RVV mask mode size

2022-12-16 Thread Jeff Law via Gcc-patches




On 12/13/22 23:48, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

This patch is to fix RVV mask modes size. Since mask mode size are adjust
as a whole RVV register size LMUL = 1 which not only make each mask type for
example vbool32_t tied to vint8m1_t but also increase memory consuming.

I notice this issue during development of VSETVL PASS. Since it is not part of
VSETVL support, I seperate it into a single fix patch now.

gcc/ChangeLog:

 * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): Reduce RVV mask mode 
size.
 * config/riscv/riscv.cc (riscv_v_adjust_bytesize): New function.
 (riscv_modes_tieable_p): Don't tie mask modes which will create issue.
 * config/riscv/riscv.h (riscv_v_adjust_bytesize): New function.
So I haven't really studied the masking model for RVV (yet).  But 
there's two models that I'm generally aware of.


One model has a bit per element in the vector we're operating on.  So a 
V4DF will have 4 bits in the mask.  I generally call this the dense or 
packed model.


The other model has a bit for every element for the maximal number of 
elements that can ever appear in a vector.  So if we support an element 
length of 8bits and a 1kbit vector, then the sparse model would have 128 
bits regardless of the size of the object being operated on.  So we'd 
still have 128 bits for V4DF, but the vast majority would be don't cares.



ISTM that you're trying to set the mode size to the smallest possible 
which would seem to argue that you want the dense/packed mask model. 
Does that actually match what the hardware does?  If not, then don't we 
need to convert back and forth?



Or maybe I'm missing something here?!?

Jeff



Re: [PATCH] RISC-V: Fix RVV mask mode size

2022-12-16 Thread Jeff Law via Gcc-patches




On 12/16/22 18:44, 钟居哲 wrote:

Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
For example vlm or vsm we will load store 8-bit ??? (I am not sure 
hardward can load store 4bit,but I am sure it definetly not load store 
the whole register size)
Most likely than not you end up loading a larger quantity with the high 
bits zero'd.  Interesting that we're using a packed model.  I'd been 
told it was fairly expensive to implement in hardware relative to teh 
cost of implementing the sparse model.


So ideally it should be model more accurate. However, since GCC assumes 
that 1 BOOL is 1-byte, the only thing I do is to model mask mode as 
smallest as possible.
Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since 
it will need to change GCC framework.
I'm a bit confused by this.  GCC can support single bit bools, though 
ports often extend them to 8 bits or more for computational efficiency 
purposes.  At least that's the case in general.  Is there something 
particularly special about masks & bools that's causing problems?


Jeff


Re: [PATCH] RISC-V: Fix RVV mask mode size

2022-12-19 Thread Richard Biener via Gcc-patches
On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 12/16/22 18:44, 钟居哲 wrote:
> > Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> > For example vlm or vsm we will load store 8-bit ??? (I am not sure
> > hardward can load store 4bit,but I am sure it definetly not load store
> > the whole register size)
> Most likely than not you end up loading a larger quantity with the high
> bits zero'd.  Interesting that we're using a packed model.  I'd been
> told it was fairly expensive to implement in hardware relative to teh
> cost of implementing the sparse model.

Since the masks are extra inputs if you use a packed model you need
to wire less bits into the execution units for the masks which I guess
is actually cheaper.  Yes, producing the masks might be more complicated.

> > So ideally it should be model more accurate. However, since GCC assumes
> > that 1 BOOL is 1-byte, the only thing I do is to model mask mode as
> > smallest as possible.
> > Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since
> > it will need to change GCC framework.
> I'm a bit confused by this.  GCC can support single bit bools, though
> ports often extend them to 8 bits or more for computational efficiency
> purposes.  At least that's the case in general.  Is there something
> particularly special about masks & bools that's causing problems?

The only "issue" might be with 4, 2 and 1 bit masks which would
have a size of 8 bits but a precision of less that endianess might
play a role.

Btw, this is all similar to AVX512 where we even don't use
vector BI modes but integer modes for the mask which
then becomes QImode for 1, 2, 4 and 8 bit masks and
HImode for 16, SImode for 32 and DImode for 64 bit masks.

Richard.

> Jeff


Re: [PATCH] RISC-V: Fix RVV mask mode size

2022-12-27 Thread Jeff Law via Gcc-patches




On 12/19/22 00:44, Richard Biener wrote:

On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
 wrote:




On 12/16/22 18:44, 钟居哲 wrote:

Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
For example vlm or vsm we will load store 8-bit ??? (I am not sure
hardward can load store 4bit,but I am sure it definetly not load store
the whole register size)

Most likely than not you end up loading a larger quantity with the high
bits zero'd.  Interesting that we're using a packed model.  I'd been
told it was fairly expensive to implement in hardware relative to teh
cost of implementing the sparse model.


Since the masks are extra inputs if you use a packed model you need
to wire less bits into the execution units for the masks which I guess
is actually cheaper.  Yes, producing the masks might be more complicated.
We went through this at a prior employer and the hardware guys argued 
strongly that a packed model for mask registers was just too expensive 
to implement.  I don't think it was the # of wires, but the muxes.  The 
number of wires into the unit was an issue when we started talking about 
sub-byte masking :-)


Conceptually on the hardware side each bit in the mask corresponds to a 
byte in a vector register.  When the element size is 8 bits, then 
obviously there is a 1:1 correspondence between potentially masked 
elements and bits the mask register.


When the element size is 32 bits, then there are 3 don't care bits in 
the mask register, then a single bit that is queried for masked 
operations.  So if you had a 128bit vector with 32 bits per element, a 
mask register might have a value like:


0xxx 1xxx 1xxx 0xxx

A 128 bit vector with 64 bits per element might be:

0xxx  1xxx 

Where the xxxs are don't cares and the 0/1 are the masks.





The only "issue" might be with 4, 2 and 1 bit masks which would
have a size of 8 bits but a precision of less that endianess might
play a role.

Btw, this is all similar to AVX512 where we even don't use
vector BI modes but integer modes for the mask which
then becomes QImode for 1, 2, 4 and 8 bit masks and
HImode for 16, SImode for 32 and DImode for 64 bit masks.

Right.  I think in hindsight that might have been a mistake.

jeff


Re: [PATCH] RISC-V: Fix RVV mask mode size

2023-01-08 Thread Richard Biener via Gcc-patches
On Tue, Dec 27, 2022 at 9:46 PM Jeff Law  wrote:
>
>
>
> On 12/19/22 00:44, Richard Biener wrote:
> > On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 12/16/22 18:44, 钟居哲 wrote:
> >>> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> >>> For example vlm or vsm we will load store 8-bit ??? (I am not sure
> >>> hardward can load store 4bit,but I am sure it definetly not load store
> >>> the whole register size)
> >> Most likely than not you end up loading a larger quantity with the high
> >> bits zero'd.  Interesting that we're using a packed model.  I'd been
> >> told it was fairly expensive to implement in hardware relative to teh
> >> cost of implementing the sparse model.
> >
> > Since the masks are extra inputs if you use a packed model you need
> > to wire less bits into the execution units for the masks which I guess
> > is actually cheaper.  Yes, producing the masks might be more complicated.
> We went through this at a prior employer and the hardware guys argued
> strongly that a packed model for mask registers was just too expensive
> to implement.  I don't think it was the # of wires, but the muxes.  The
> number of wires into the unit was an issue when we started talking about
> sub-byte masking :-)
>
> Conceptually on the hardware side each bit in the mask corresponds to a
> byte in a vector register.  When the element size is 8 bits, then
> obviously there is a 1:1 correspondence between potentially masked
> elements and bits the mask register.
>
> When the element size is 32 bits, then there are 3 don't care bits in
> the mask register, then a single bit that is queried for masked
> operations.  So if you had a 128bit vector with 32 bits per element, a
> mask register might have a value like:
>
> 0xxx 1xxx 1xxx 0xxx
>
> A 128 bit vector with 64 bits per element might be:
>
> 0xxx  1xxx 
>
> Where the xxxs are don't cares and the 0/1 are the masks.
>
>
>
> >
> > The only "issue" might be with 4, 2 and 1 bit masks which would
> > have a size of 8 bits but a precision of less that endianess might
> > play a role.
> >
> > Btw, this is all similar to AVX512 where we even don't use
> > vector BI modes but integer modes for the mask which
> > then becomes QImode for 1, 2, 4 and 8 bit masks and
> > HImode for 16, SImode for 32 and DImode for 64 bit masks.
> Right.  I think in hindsight that might have been a mistake.

Yes, vector BI modes would have been better here.
On GCN the mask is always DImode, that would have been the
other (better) alternative here I think.

Richard.

>
> jeff


Re: Re: [PATCH] RISC-V: Fix RVV mask mode size

2022-12-16 Thread 钟居哲
Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
For example vlm or vsm we will load store 8-bit ??? (I am not sure hardward can 
load store 4bit,but I am sure it definetly not load store the whole register 
size)
So ideally it should be model more accurate. However, since GCC assumes that 1 
BOOL is 1-byte, the only thing I do is to model mask mode as smallest as 
possible.
Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since it 
will need to change GCC framework.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2022-12-17 04:22
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix RVV mask mode size
 
 
On 12/13/22 23:48, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is to fix RVV mask modes size. Since mask mode size are adjust
> as a whole RVV register size LMUL = 1 which not only make each mask type for
> example vbool32_t tied to vint8m1_t but also increase memory consuming.
> 
> I notice this issue during development of VSETVL PASS. Since it is not part of
> VSETVL support, I seperate it into a single fix patch now.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): Reduce RVV mask 
> mode size.
>  * config/riscv/riscv.cc (riscv_v_adjust_bytesize): New function.
>  (riscv_modes_tieable_p): Don't tie mask modes which will create 
> issue.
>  * config/riscv/riscv.h (riscv_v_adjust_bytesize): New function.
So I haven't really studied the masking model for RVV (yet).  But 
there's two models that I'm generally aware of.
 
One model has a bit per element in the vector we're operating on.  So a 
V4DF will have 4 bits in the mask.  I generally call this the dense or 
packed model.
 
The other model has a bit for every element for the maximal number of 
elements that can ever appear in a vector.  So if we support an element 
length of 8bits and a 1kbit vector, then the sparse model would have 128 
bits regardless of the size of the object being operated on.  So we'd 
still have 128 bits for V4DF, but the vast majority would be don't cares.
 
 
ISTM that you're trying to set the mode size to the smallest possible 
which would seem to argue that you want the dense/packed mask model. 
Does that actually match what the hardware does?  If not, then don't we 
need to convert back and forth?
 
 
Or maybe I'm missing something here?!?
 
Jeff
 
 


Re: Re: [PATCH] RISC-V: Fix RVV mask mode size

2022-12-16 Thread 钟居哲
>> Most likely than not you end up loading a larger quantity with the high
>> bits zero'd.  Interesting that we're using a packed model.  I'd been
>> told it was fairly expensive to implement in hardware relative to teh
>> cost of implementing the sparse model.

>> I'm a bit confused by this.  GCC can support single bit bools, though
>> ports often extend them to 8 bits or more for computational efficiency
>> purposes.  At least that's the case in general.  Is there something
>> particularly special about masks & bools that's causing problems?
I am not sure I am on the same page with you. I don't understand what is the
sparse model you said. The only thing I do in this patch is that we change the 
BYTESIZE VNx1BI for example
as the BYTESIZE of VNx1BI (Original I adjust all mask modes same size as 
VNx8QImode like LLVM). 
And I print the GET_MODE_SIZE (VNx1BI) the value is the same as VNx1QImode so I 
assume because GCC model 1-bool same as 1-QI???
Actually I not sure but I am sure after this patch, VNx1BI is adjusted smaller 
size.

Adjusting mask modes as smaller size always beneficial, since we can use vlm && 
vsm in register spilling, it can reduce the memory consuming and
load store hardware bandwidth.

Unlike LLVM, LLVM make each fractional vector and mask vector same size as LMUL 
=1 so they use vl1r/vs1r to do the register spilling which is not
optimal.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2022-12-17 09:53
To: 钟居哲; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix RVV mask mode size
 
 
On 12/16/22 18:44, 钟居哲 wrote:
> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> For example vlm or vsm we will load store 8-bit ??? (I am not sure 
> hardward can load store 4bit,but I am sure it definetly not load store 
> the whole register size)
Most likely than not you end up loading a larger quantity with the high 
bits zero'd.  Interesting that we're using a packed model.  I'd been 
told it was fairly expensive to implement in hardware relative to teh 
cost of implementing the sparse model.
 
> So ideally it should be model more accurate. However, since GCC assumes 
> that 1 BOOL is 1-byte, the only thing I do is to model mask mode as 
> smallest as possible.
> Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since 
> it will need to change GCC framework.
I'm a bit confused by this.  GCC can support single bit bools, though 
ports often extend them to 8 bits or more for computational efficiency 
purposes.  At least that's the case in general.  Is there something 
particularly special about masks & bools that's causing problems?
 
Jeff