钟居哲 <juzhe.zh...@rivai.ai> writes:
> Hi, Richard.
>
>>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>>> (But AArch64 might need to lower lane operations more than it does now if
>>> we want gimple to handle it.)
>
> We were trying to address such issue at GIMPLE leve at the beginning.
> Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 
> only tuple types.
> However, for RVV, that's not enough to address all issues.
> Consider this following situation:
> https://godbolt.org/z/fhTvEjvr8 
>
> You can see comparing with LLVM, GCC has so many redundant mov instructions 
> "vmv1r.v".
> Since GCC is not able to tracking subreg liveness, wheras LLVM can.
>
> The reason why tracking sub-lanes in GIMPLE can not address these redundant 
> move issues for RVV:
>
> 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as 
> aarch64 "svint8x1_t".
>     It used by segment load/store which is similiar instruction "ld2r" 
> instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
>     Support sub-lanes tracking in GIMPLE can fix this situation for both RVV 
> and ARM SVE.
>     
> 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL 
> =2) which also occupies 2 regsiters
>     which is not tuple type, instead, it is simple vector type. Such type is 
> used by all simple operations.
>     For example, "vadd" with vint8m1_t is doing PLUS operation on single 
> vector registers, wheras same
>     instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector 
> registers.  Such type we can't
>     define them as tuple type for following reasons:
>     1). we also have tuple type for LMUL > 1, for example, we also have 
> "vint8m2x2_t" has tuple type.
>          If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , 
> Tuple type with tuple or
>          Array with array ? It makes type so strange.
>     2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t 
> not tuple type. We are not able
>          to change the documents.
>     3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not 
> tuple type for 3 years and widely
>          used, changing type definition will destroy ecosystem.  So for 
> compability, we are not able define
>          LMUL > 1 as tuple type.
>
> For these reasons, we should be able to access highpart of vint8m2_t and 
> lowpart of vint8m2_t, we provide
> vget to generate subreg access of the vector mode.
>
> So, at the discussion stage, we decided to address subpart access of vector 
> mode in more generic way,
> which is support subreg liveness tracking in RTL level. So that it can not 
> only address issues happens on ARM SVE,
> but also address issues for LMUL > 1.
>
> 3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
>     Actually, LLVM has a standalone PASS right before their linear scan RA 
> (greedy) call register coalescer.
>     So, the first draft of our solution is supporting register coalescing 
> before RA which is opened source:
>     riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next · 
> riscv-collab/riscv-gcc (github.com)
>     by simulating LLVM solution. However, we don't think such solution is 
> elegant and we have consulted
>     Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness 
> tracking which turns to be
>     more reasonable and elegant approach. 
>
> So, after Lehua several experiments and investigations, he dedicate himself 
> produce this series of patches.
> And we think Lehua's approach should be generic and optimal solution to fix 
> this subreg generic problems.

Ah, sorry, I caused a misunderstanding.  In the message quoted above,
I'd moved on from talking about tracking liveness of vectors in a tuple.
I was instead talking about tracking the liveness of individual lanes
in a single vector.

I was responding to Jeff's description of the bit-level liveness tracking
pass.  That pass solves a generic issue: redundant sign and zero extensions.
But it sounded like it could also be reused for tracking lanes of a vector
(by using different bit ranges from the ones that Jeff listed).

The thing that I was saying might be better done on gimple was tracking
lanes of an individual vector.  In other words, I was arguing against
my own question.

I should have changed the subject line when responding, sorry.

I wasn't suggesting that we should avoid subreg tracking in the RA.
That's definitely needed for AArch64, and in general.

Thanks,
Richard

Reply via email to