>> But it would make the pointer IV updates more complex.  So let's
>> say that that's the reason for preferring solution 3.

Yes, I prefer pointer solution 3 to avoid complex IV updates and there is
no benefits in solution 2(unlike single-rgroup).

I read your comments, it's more comprehensive than I wrote.

I will send V3 patch with appending your comments.

Thanks you so much!


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-05 18:09
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Add SELECT_VL support
"juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
> Hi, Richard.
>
>>> No, I meant that the comment I quoted seemed to be saying that solution
>>> 3 wasn't possible.  The comment seemed to say that we would need to do
>>> solution 1.
> I am so sorry that I didn't write the comments accurately.
> Could you help me with comments ? Base on what we have discussed above (I 
> think we are on same page now).
 
Yeah, agree we seem to be on the same page
 
>>> When comparing solutions 2 and 3 for case (b), is solution 3 still better?
>>> E.g. is "vsetvli zero" cheaper than "vsetvli <gpr>"?
>
>
> "vsetvli zero" is the same cost as "vsetvli gpr", 
>
> I think for (b),  solution 2 and solution 3 should be almost the same.
 
OK, thanks.  If we wanted to use solution 2 for (b), the condition
would be just:
 
  LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1
 
dropping the:
 
  LOOP_VINFO_LENS (loop_vinfo).length () == 1
 
But it would make the pointer IV updates more complex.  So let's
say that that's the reason for preferring solution 3.
 
So rather than:
 
+  /* If we're using decrement IV approach in loop control, we can use output of
+     SELECT_VL to adjust IV of loop control and data reference when it 
satisfies
+     the following checks:
+
+     (a) SELECT_VL is supported by the target.
+     (b) LOOP_VINFO is single-rgroup control.
+     (c) non-SLP.
+     (d) LOOP can not be unrolled.
+
+     Otherwise, we use MIN_EXPR approach.
+
+     1. We only apply SELECT_VL on single-rgroup since:
+
+     (1). Multiple-rgroup controls N vector loads/stores would need N pointer
+   updates by variable amounts.
+     (2). SELECT_VL allows flexible length (<=VF) in each iteration.
+     (3). For decrement IV approach, we calculate the MAX length of the loop
+   and then deduce the length of each control from this MAX length.
+
+     Base on (1), (2) and (3) situations, if we try to use SELECT_VL on
+     multiple-rgroup control, we need to generate multiple SELECT_VL to
+     carefully adjust length of each control. Such approach is very inefficient
+     and unprofitable for targets that are using a standalone instruction
+     to configure the length of each operation.
+     E.g. RISC-V vector use 'vsetvl' to configure the length of each operation.
 
how about:
 
  /* If a loop uses length controls and has a decrementing loop control IV,
     we will normally pass that IV through a MIN_EXPR to calcaluate the
     basis for the length controls.  E.g. in a loop that processes one
     element per scalar iteration, the number of elements would be
     MIN_EXPR <N, VF>, where N is the number of scalar iterations left.
 
     This MIN_EXPR approach allows us to use pointer IVs with an invariant
     step, since only the final iteration of the vector loop can have
     inactive lanes.
 
     However, some targets have a dedicated instruction for calculating the
     preferred length, given the total number of elements that still need to
     be processed.  This is encapsulated in the SELECT_VL internal function.
 
     If the target supports SELECT_VL, we can use it instead of MIN_EXPR
     to determine the basis for the length controls.  However, unlike the
     MIN_EXPR calculation, the SELECT_VL calculation can decide to make
     lanes inactive in any iteration of the vector loop, not just the last
     iteration.  This SELECT_VL approach therefore requires us to use pointer
     IVs with variable steps.
 
     Once we've decided how many elements should be processed by one
     iteration of the vector loop, we need to populate the rgroup controls.
     If a loop has multiple rgroups, we need to make sure that those rgroups
     "line up" (that is, they must be consistent about which elements are
     active and which aren't).  This is done by vect_adjust_loop_lens_control.
 
     In principle, it would be possible to use vect_adjust_loop_lens_control
     on either the result of a MIN_EXPR or the result of a SELECT_VL.
     However:
 
     (1) In practice, it only makes sense to use SELECT_VL when a vector
         operation will be controlled directly by the result.  It is not
         worth using SELECT_VL if it would only be the input to other
         calculations.
 
     (2) If we use SELECT_VL for an rgroup that has N controls, each associated
         pointer IV will need N updates by a variable amount (N-1 updates
         within the iteration and 1 update to move to the next iteration).
         
     Because of this, we prefer to use the MIN_EXPR approach whenever there
     is more than one length control.
 
     In addition, SELECT_VL always operates to a granularity of 1 unit.
     If we wanted to use it to control an SLP operation on N consecutive
     elements, we would need to make the SELECT_VL inputs measure scalar
     iterations (rather than elements) and then multiply the SELECT_VL
     result by N.  But using SELECT_VL this way is inefficient because
     of (1) above.
 
Thanks,
Richard
 

Reply via email to