On Fri, 14 Apr 2023, juzhe.zh...@rivai.ai wrote:

> And also I already decided to make remove WHILE_LEN pattern since it seems to 
> be unnecessary.
> And as Richard said, it's just a simple airthmetic and it's not worthwhile to 
> do that.
> 
> So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV 
> specific done in RISC-V port.
> I think it's more reasonable for IBM use and more target use in the future.
> 
> So, this patch will need to changed as "introduce a new flow to do 
> vectorization loop control" which is a new loop control flow
> with saturating subtracting n down to zero, and add a target hook for it so 
> that we can switch to this flow ?
> 
> Is it more reasonable ?

I think we want to change the various IVs the vectorizer uses to
control the exit condition of prologue/vect/epilogue loops to a single
one counting the remaining _scalar_ iterations to zero.  Currently
it's somewhat of a mess which also leads to difficult to CSE expressions
based on derived values of such an IV.

But yes, whether for example the vector loop control stmt should
be a test for zero mask (while-ult) or zero scalar iterations
(or (signed) <= zero) could be subject to a new target hook if it
isn't an obvious choice based on HW capability checks we can already
do.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Kewen.Lin
> Date: 2023-04-14 10:54
> To: ???
> CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> Hi Juzhe,
>  
> on 2023/4/13 21:44, ??? wrote:
> > Thanks Kewen.
> > 
> > Current flow in this patch like you said:
> > ....
> > len = WHILE_LEN (n,vf);
> > ...
> > v = len_load (addr,len);
> > ..
> > addr = addr + vf (in byte align);
> > ....
> > 
> > This patch is just keep adding address with a vector factor (adjust as byte 
> > align).
> > For example, if your vector length = 512bit. Then this patch is just 
> > updating address as
> > addr = addr + 64;
> > 
> > However, today after I read RVV ISA more deeply, it should be more 
> > appropriate that
> > the address should updated as : addr = addr + (len * 4) if len is element 
> > number of INT32.
> > the len is the result by WHILE_LEN which calculate the len.
>  
> I just read your detailed explanation on the usage of vsetvli insn (really 
> appreciate that),
> it looks that this WHILE_LEN wants some more semantics than MIN, so I assume 
> you still want
> to introduce this WHILE_LEN.
>  
> > 
> > I assume for IBM target, it's better to just update address directly adding 
> > the whole register bytesize 
> > in address IV. Since I think the second way (address = addr + (len * 4)) is 
> > too RVV specific, and won't be suitable for IBM. Is that right?
>  
> Yes, we just wants to add the whole vector register length in bytes.
>  
> > If it is true, I will keep this patch flow (won't change to  address = addr 
> > + (len * 4)) to see what else I need to do for IBM.
> > I would rather do that in RISC-V backend port.
>  
> IMHO, you don't need to push this down to RV backend, just query these ports 
> having len_{load,store}
> support with a target hook or special operand in optab while_len (see 
> internal_len_load_store_bias)
> for this need, and generate different codes accordingly.  IIUC, for 
> WHILE_LEN, you want it to have
> the semantics as what vsetvli performs, but for IBM ports, it would be just 
> like MIN_EXPR, maybe we
> can also generate MIN or WHILE_LEN based on this kind of target information.
>  
> If the above assumption holds, I wonder if you also want WHILE_LEN to have 
> the implicit effect
> to update vector length register?  If yes, the codes with multiple rgroups 
> looks unexpected:
>  
> + _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
> + _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
>  
> as the latter one seems to override the former.  Besides, if the given 
> operands are known constants,
> it can't directly be folded into constants and do further propagation.   From 
> this perspective, Richi's
> suggestion on "tieing the scalar result with the uses" looks better IMHO.
>  
> > 
> >>> I tried
> >>>to compile the above source files on Power, the former can adopt doloop
> >>>optimization but the latter fails to. 
> > You mean GCC can not do hardward loop optimization when IV loop control is 
> > variable ? 
>  
> No, for both cases, IV is variable, the dumping at loop2_doloop for the 
> proposed sequence says
> "Doloop: Possible infinite iteration case.", it seems to show that for the 
> proposed sequence compiler 
> isn't able to figure out the loop is finite, it may miss the range 
> information on n, or it isn't
> able to analyze how the invariant involves, but I didn't look into it, all my 
> guesses.
>  
> BR,
> Kewen
>  
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Reply via email to