juzhe.zh...@rivai.ai writes: > + /* If we're using decrement IV approach in loop control, we can use output > of > + SELECT_VL to adjust IV of loop control and data reference when it > satisfies > + the following checks: > + > + (a) SELECT_VL is supported by the target. > + (b) LOOP_VINFO is single-rgroup control. > + (c) non-SLP. > + (d) LOOP can not be unrolled. > + > + Otherwise, we use MIN_EXPR approach. > + > + 1. We only apply SELECT_VL on single-rgroup since: > + > + (1). Multiple-rgroup controls N vector loads/stores would need N pointer > + updates by variable amounts. > + (2). SELECT_VL allows flexible length (<=VF) in each iteration. > + (3). For decrement IV approach, we calculate the MAX length of the loop > + and then deduce the length of each control from this MAX length. > + > + Base on (1), (2) and (3) situations, if we try to use SELECT_VL on > + multiple-rgroup control, we need to generate multiple SELECT_VL to > + carefully adjust length of each control.
If we use SELECT_VL to refer only to the target-independent ifn, I don't see why this last bit is true. Like I said in the previous message, when it comes to determining the length of each control, the approach we take for MIN_EXPR IVs should work for SELECT_VL IVs. The point is that, in both cases, any inactive lanes are always the last lanes. E.g. suppose that, for one particular iteration, SELECT_VL decides that 6 lanes should be active in a loop with VF==8. If there is a 2-control rgroup with 4 lanes each, the first control must be 4 and the second control must be 2, just as if a MIN_EXPR had decided that 6 lanes of the final iteration are active. I'm not saying the decision itself is wrong. But I think the explanation could be clearer. > + Such approach is very inefficient > + and unprofitable for targets that are using a standalone instruction > + to configure the length of each operation. > + E.g. RISC-V vector use 'vsetvl' to configure the length of each > operation. What I don't understand is why this isn't also a problem with the fallback MIN_EXPR approach. That is, with the same example as above, but using MIN_EXPR IVs, I would have expected: VF == 8 1-control rgroup "A": A set by MIN_EXPR IV 2-control rgroup "B1", "B2": B1 = MIN (A, 4) B2 = A - B1 and so the vectors controlled by A, B1 and B2 would all have different lengths. Is the point that, when using MIN_EXPR, this only happens in the final iteration? And that you use a tail/epilogue loop for that, so that the main loop body operates on full vectors only? Thanks, Richard