On Wed, 1 Mar 2023, Richard Biener wrote:

> On Wed, 1 Mar 2023, juzhe.zh...@rivai.ai wrote:
> 
> > Let's me first introduce RVV load/store basics  and stack allocation.
> > For scalable vector memory allocation, we allocate memory according to 
> > machine vector-length.
> > To get this CPU vector-length value (runtime invariant but compile time 
> > unknown), we have an instruction call csrr vlenb.
> > For example, csrr a5,vlenb (store CPU a single register vector-length value 
> > (describe as bytesize) in a5 register).
> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. 
> > That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> > 
> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same 
> > bytesize poly (1,1). So their storage consumes the same size.
> > Meaning when we want to allocate a memory storge or stack for register 
> > spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = 
> > a5/8)
> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, 
> > VNx8BI are doing the same process as I described above. They all consume
> > the same memory storage size since we can't model them accurately according 
> > to precision or you bitsize.
> > 
> > They consume the same storage (I am agree it's better to model them more 
> > accurately in case of memory storage comsuming).
> > 
> > Well, even though they are consuming same size memory storage, I can make 
> > their memory accessing behavior (load/store) accurately by
> > emiting  the accurate RVV instruction for them according to RVV ISA.
> > 
> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size  
> > poly (1,1)
> > The instruction for these modes as follows:
> > VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
> > VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
> > VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
> > VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.
> > 
> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, 
> > VNx8BI accurately according to precision or bitsize.
> > This implementation is fine even though their memory storage is not 
> > accurate.
> > 
> > However, the problem is that since they have the same bytesize, GCC will 
> > think they are the same and do some incorrect statement elimination:
> > 
> > (Note: Load same memory base)
> > load v0 VNx1BI from base0
> > load v1 VNx2BI from base0
> > load v2 VNx4BI from base0
> > load v3 VNx8BI from base0
> > 
> > store v0 base1
> > store v1 base2
> > store v2 base3
> > store v3 base4
> > 
> > This program sequence, in GCC, it will eliminate the last 3 load 
> > instructions.
> > 
> > Then it will become:
> > 
> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly 
> > size (1,1) memory data)
> > 
> > store v0 base1
> > store v0 base2
> > store v0 base3
> > store v0 base4
> > 
> > This is what we want to fix. I think as long as we can have the way to 
> > differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> > and GCC will not do th incorrect elimination for RVV. 
> > 
> > I think it can work fine  even though these 4 modes consume inaccurate 
> > memory storage size
> > but accurate data memory access load store behavior.
> 
> So given the above I think that modeling the size as being the same
> but with accurate precision would work.  It's then only the size of the
> padding in bytes we cannot represent with poly-int which should be fine.
> 
> Correct?

Btw, is storing a VNx1BI and then loading a VNx2BI from the same
memory address well-defined?  That is, how is the padding handled
by the machine load/store instructions?

Richard.

Reply via email to