On Fri, 14 Apr 2023, juzhe.zh...@rivai.ai wrote: > And also I already decided to make remove WHILE_LEN pattern since it seems to > be unnecessary. > And as Richard said, it's just a simple airthmetic and it's not worthwhile to > do that. > > So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV > specific done in RISC-V port. > I think it's more reasonable for IBM use and more target use in the future. > > So, this patch will need to changed as "introduce a new flow to do > vectorization loop control" which is a new loop control flow > with saturating subtracting n down to zero, and add a target hook for it so > that we can switch to this flow ? > > Is it more reasonable ?
I think we want to change the various IVs the vectorizer uses to control the exit condition of prologue/vect/epilogue loops to a single one counting the remaining _scalar_ iterations to zero. Currently it's somewhat of a mess which also leads to difficult to CSE expressions based on derived values of such an IV. But yes, whether for example the vector loop control stmt should be a test for zero mask (while-ult) or zero scalar iterations (or (signed) <= zero) could be subject to a new target hook if it isn't an obvious choice based on HW capability checks we can already do. Richard. > Thanks. > > > juzhe.zh...@rivai.ai > > From: Kewen.Lin > Date: 2023-04-14 10:54 > To: ??? > CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther > Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for > auto-vectorization > Hi Juzhe, > > on 2023/4/13 21:44, ??? wrote: > > Thanks Kewen. > > > > Current flow in this patch like you said: > > .... > > len = WHILE_LEN (n,vf); > > ... > > v = len_load (addr,len); > > .. > > addr = addr + vf (in byte align); > > .... > > > > This patch is just keep adding address with a vector factor (adjust as byte > > align). > > For example, if your vector length = 512bit. Then this patch is just > > updating address as > > addr = addr + 64; > > > > However, today after I read RVV ISA more deeply, it should be more > > appropriate that > > the address should updated as : addr = addr + (len * 4) if len is element > > number of INT32. > > the len is the result by WHILE_LEN which calculate the len. > > I just read your detailed explanation on the usage of vsetvli insn (really > appreciate that), > it looks that this WHILE_LEN wants some more semantics than MIN, so I assume > you still want > to introduce this WHILE_LEN. > > > > > I assume for IBM target, it's better to just update address directly adding > > the whole register bytesize > > in address IV. Since I think the second way (address = addr + (len * 4)) is > > too RVV specific, and won't be suitable for IBM. Is that right? > > Yes, we just wants to add the whole vector register length in bytes. > > > If it is true, I will keep this patch flow (won't change to address = addr > > + (len * 4)) to see what else I need to do for IBM. > > I would rather do that in RISC-V backend port. > > IMHO, you don't need to push this down to RV backend, just query these ports > having len_{load,store} > support with a target hook or special operand in optab while_len (see > internal_len_load_store_bias) > for this need, and generate different codes accordingly. IIUC, for > WHILE_LEN, you want it to have > the semantics as what vsetvli performs, but for IBM ports, it would be just > like MIN_EXPR, maybe we > can also generate MIN or WHILE_LEN based on this kind of target information. > > If the above assumption holds, I wonder if you also want WHILE_LEN to have > the implicit effect > to update vector length register? If yes, the codes with multiple rgroups > looks unexpected: > > + _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl); > + _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl); > > as the latter one seems to override the former. Besides, if the given > operands are known constants, > it can't directly be folded into constants and do further propagation. From > this perspective, Richi's > suggestion on "tieing the scalar result with the uses" looks better IMHO. > > > > >>> I tried > >>>to compile the above source files on Power, the former can adopt doloop > >>>optimization but the latter fails to. > > You mean GCC can not do hardward loop optimization when IV loop control is > > variable ? > > No, for both cases, IV is variable, the dumping at loop2_doloop for the > proposed sequence says > "Doloop: Possible infinite iteration case.", it seems to show that for the > proposed sequence compiler > isn't able to figure out the loop is finite, it may miss the range > information on n, or it isn't > able to analyze how the invariant involves, but I didn't look into it, all my > guesses. > > BR, > Kewen > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)