On Wed, 1 Mar 2023, Richard Biener wrote: > On Wed, 1 Mar 2023, juzhe.zh...@rivai.ai wrote: > > > Let's me first introduce RVV load/store basics and stack allocation. > > For scalable vector memory allocation, we allocate memory according to > > machine vector-length. > > To get this CPU vector-length value (runtime invariant but compile time > > unknown), we have an instruction call csrr vlenb. > > For example, csrr a5,vlenb (store CPU a single register vector-length value > > (describe as bytesize) in a5 register). > > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. > > That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes. > > > > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same > > bytesize poly (1,1). So their storage consumes the same size. > > Meaning when we want to allocate a memory storge or stack for register > > spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = > > a5/8) > > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, > > VNx8BI are doing the same process as I described above. They all consume > > the same memory storage size since we can't model them accurately according > > to precision or you bitsize. > > > > They consume the same storage (I am agree it's better to model them more > > accurately in case of memory storage comsuming). > > > > Well, even though they are consuming same size memory storage, I can make > > their memory accessing behavior (load/store) accurately by > > emiting the accurate RVV instruction for them according to RVV ISA. > > > > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size > > poly (1,1) > > The instruction for these modes as follows: > > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage. > > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage. > > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage. > > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage. > > > > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, > > VNx8BI accurately according to precision or bitsize. > > This implementation is fine even though their memory storage is not > > accurate. > > > > However, the problem is that since they have the same bytesize, GCC will > > think they are the same and do some incorrect statement elimination: > > > > (Note: Load same memory base) > > load v0 VNx1BI from base0 > > load v1 VNx2BI from base0 > > load v2 VNx4BI from base0 > > load v3 VNx8BI from base0 > > > > store v0 base1 > > store v1 base2 > > store v2 base3 > > store v3 base4 > > > > This program sequence, in GCC, it will eliminate the last 3 load > > instructions. > > > > Then it will become: > > > > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly > > size (1,1) memory data) > > > > store v0 base1 > > store v0 base2 > > store v0 base3 > > store v0 base4 > > > > This is what we want to fix. I think as long as we can have the way to > > differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI > > and GCC will not do th incorrect elimination for RVV. > > > > I think it can work fine even though these 4 modes consume inaccurate > > memory storage size > > but accurate data memory access load store behavior. > > So given the above I think that modeling the size as being the same > but with accurate precision would work. It's then only the size of the > padding in bytes we cannot represent with poly-int which should be fine. > > Correct?
Btw, is storing a VNx1BI and then loading a VNx2BI from the same memory address well-defined? That is, how is the padding handled by the machine load/store instructions? Richard.