>> But it would make the pointer IV updates more complex. So let's >> say that that's the reason for preferring solution 3.
Yes, I prefer pointer solution 3 to avoid complex IV updates and there is no benefits in solution 2(unlike single-rgroup). I read your comments, it's more comprehensive than I wrote. I will send V3 patch with appending your comments. Thanks you so much! juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-06-05 18:09 To: juzhe.zhong\@rivai.ai CC: gcc-patches; rguenther Subject: Re: [PATCH V2] VECT: Add SELECT_VL support "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: > Hi, Richard. > >>> No, I meant that the comment I quoted seemed to be saying that solution >>> 3 wasn't possible. The comment seemed to say that we would need to do >>> solution 1. > I am so sorry that I didn't write the comments accurately. > Could you help me with comments ? Base on what we have discussed above (I > think we are on same page now). Yeah, agree we seem to be on the same page >>> When comparing solutions 2 and 3 for case (b), is solution 3 still better? >>> E.g. is "vsetvli zero" cheaper than "vsetvli <gpr>"? > > > "vsetvli zero" is the same cost as "vsetvli gpr", > > I think for (b), solution 2 and solution 3 should be almost the same. OK, thanks. If we wanted to use solution 2 for (b), the condition would be just: LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 dropping the: LOOP_VINFO_LENS (loop_vinfo).length () == 1 But it would make the pointer IV updates more complex. So let's say that that's the reason for preferring solution 3. So rather than: + /* If we're using decrement IV approach in loop control, we can use output of + SELECT_VL to adjust IV of loop control and data reference when it satisfies + the following checks: + + (a) SELECT_VL is supported by the target. + (b) LOOP_VINFO is single-rgroup control. + (c) non-SLP. + (d) LOOP can not be unrolled. + + Otherwise, we use MIN_EXPR approach. + + 1. We only apply SELECT_VL on single-rgroup since: + + (1). Multiple-rgroup controls N vector loads/stores would need N pointer + updates by variable amounts. + (2). SELECT_VL allows flexible length (<=VF) in each iteration. + (3). For decrement IV approach, we calculate the MAX length of the loop + and then deduce the length of each control from this MAX length. + + Base on (1), (2) and (3) situations, if we try to use SELECT_VL on + multiple-rgroup control, we need to generate multiple SELECT_VL to + carefully adjust length of each control. Such approach is very inefficient + and unprofitable for targets that are using a standalone instruction + to configure the length of each operation. + E.g. RISC-V vector use 'vsetvl' to configure the length of each operation. how about: /* If a loop uses length controls and has a decrementing loop control IV, we will normally pass that IV through a MIN_EXPR to calcaluate the basis for the length controls. E.g. in a loop that processes one element per scalar iteration, the number of elements would be MIN_EXPR <N, VF>, where N is the number of scalar iterations left. This MIN_EXPR approach allows us to use pointer IVs with an invariant step, since only the final iteration of the vector loop can have inactive lanes. However, some targets have a dedicated instruction for calculating the preferred length, given the total number of elements that still need to be processed. This is encapsulated in the SELECT_VL internal function. If the target supports SELECT_VL, we can use it instead of MIN_EXPR to determine the basis for the length controls. However, unlike the MIN_EXPR calculation, the SELECT_VL calculation can decide to make lanes inactive in any iteration of the vector loop, not just the last iteration. This SELECT_VL approach therefore requires us to use pointer IVs with variable steps. Once we've decided how many elements should be processed by one iteration of the vector loop, we need to populate the rgroup controls. If a loop has multiple rgroups, we need to make sure that those rgroups "line up" (that is, they must be consistent about which elements are active and which aren't). This is done by vect_adjust_loop_lens_control. In principle, it would be possible to use vect_adjust_loop_lens_control on either the result of a MIN_EXPR or the result of a SELECT_VL. However: (1) In practice, it only makes sense to use SELECT_VL when a vector operation will be controlled directly by the result. It is not worth using SELECT_VL if it would only be the input to other calculations. (2) If we use SELECT_VL for an rgroup that has N controls, each associated pointer IV will need N updates by a variable amount (N-1 updates within the iteration and 1 update to move to the next iteration). Because of this, we prefer to use the MIN_EXPR approach whenever there is more than one length control. In addition, SELECT_VL always operates to a granularity of 1 unit. If we wanted to use it to control an SLP operation on N consecutive elements, we would need to make the SELECT_VL inputs measure scalar iterations (rather than elements) and then multiply the SELECT_VL result by N. But using SELECT_VL this way is inefficient because of (1) above. Thanks, Richard