On Wed, 12 Apr 2023, ??? wrote: > >> It's not so much that we need to do that. But normally it's only worth > >> adding internal functions if they do something that is too complicated > >> to express in simple gimple arithmetic. The UQDEC case I mentioned: > > >> z = MAX (x, y) - y > > >> fell into the "simple arithmetic" category for me. We could have added > >> an ifn for unsigned saturating decrement, but it didn't seem complicated > >> enough to merit its own ifn. > > Ah, I known your concern. I should admit that WHILE_LEN is a simple > arithmetic operation > which is just taking result from > > min (remain,vf). > > The possible solution is to just use MIN_EXPR (remain,vf). > Then, add speciall handling in umin_optab pattern to recognize "vf" in the > backend. > Finally generate vsetvl in RISC-V backend. > > The "vf" should be recognized as the operand of umin should be > const_int/const_poly_int operand. > Otherwise, just generate umin scalar instruction.. > > However, there is a case that I can't recognize umin should generate vsetvl > or umin. Is this following case: > void foo (int32_t a) > { > return min (a, 4); > } > > In this case I should generate: > li a1,4 > umin a1,a0,a1 > > instead of generating vsetvl > > However, in this case: > > void foo (int32_t *a...) > for (int i = 0; i < n; i++) > a[i] = b[i] + c[i]; > > with -mriscv-vector-bits=128 (which means each vector can handle 4 INT32) > Then the VF will be 4 too. If we also MIN_EXPR instead WHILE_LEN: > > ... > len = MIN_EXPR (n,4) > v = len_load (len) > .... > ... > > In this case, MIN_EXPR should emit vsetvl. > > It's hard for me to tell the difference between these 2 cases...
But the issue is the same in the reverse with WHILE_LEN, no? WHILE_LEN just computes a scalar value - you seem to suggest there's a hidden side-effect of "coalescing" the result with a hardware vector length register? I don't think that's good design. IMHO tieing the scalar result with the uses has to be done where you emit the other vsetvl instructions. One convenient thing we have with WHILE_LEN is that it is a key for the vectorizer to query target capabilities (and preferences). But of course collecting whether stmts can be vectorized with length and/or with mask would be better. Richard. > CC RISC-V port backend maintainer: Kito. > > > > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2023-04-12 20:24 > To: juzhe.zhong\@rivai.ai > CC: rguenther; gcc-patches; jeffreyalaw; rdapp; linkw > Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for > auto-vectorization > "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: > >>> I think that already works for them (could be misremembering). > >>> However, IIUC, they have no special instruction to calculate the > >>> length (unlike for RVV), and so it's open-coded using vect_get_len. > > > > Yeah, the current flow using min, sub, and then min in vect_get_len > > is working for IBM. But I wonder whether switching the current flow of > > length-loop-control into the WHILE_LEN pattern that this patch can improve > > their performance. > > > >>> (1) How easy would it be to express WHILE_LEN in normal gimple? > >>> I haven't thought about this at all, so the answer might be > >>> "very hard". But it reminds me a little of UQDEC on AArch64, > >>> which we open-code using MAX_EXPR and MINUS_EXPR (see > > >> vect_set_loop_controls_directly). > > > > >> I'm not saying WHILE_LEN is the same operation, just that it seems > > >> like it might be open-codeable in a similar way. > > > > >> Even if we can open-code it, we'd still need some way for the > > >> target to select the "RVV way" from the "s390/PowerPC way". > > > > WHILE_LEN in doc I define is > > operand0 = MIN (operand1, operand2)operand1 is the residual number of > > scalar elements need to be updated.operand2 is vectorization factor (vf) > > for single rgroup. if multiple rgroup operan2 = vf * > > nitems_per_ctrl.You mean such pattern is not well expressed so we need to > > replace it with normaltree code (MIN OR MAX). And let RISC-V backend to > > optimize them into vsetvl ?Sorry, maybe I am not on the same page. > > It's not so much that we need to do that. But normally it's only worth > adding internal functions if they do something that is too complicated > to express in simple gimple arithmetic. The UQDEC case I mentioned: > > z = MAX (x, y) - y > > fell into the "simple arithmetic" category for me. We could have added > an ifn for unsigned saturating decrement, but it didn't seem complicated > enough to merit its own ifn. > > >>> (2) What effect does using a variable IV step (the result of > >>> the WHILE_LEN) have on ivopts? I remember experimenting with > >>> something similar once (can't remember the context) and not > >>> having a constant step prevented ivopts from making good > >>> addresing-mode choices. > > > > Thank you so much for pointing out this. Currently, varialble IV step and > > decreasing n down to 0 > > works fine for RISC-V downstream GCC and we didn't find issues related > > addressing-mode choosing. > > OK, that's good. Sounds like it isn't a problem then. > > > I think I must missed something, would you mind giving me some hints so > > that I can study on ivopts > > to find out which case may generate inferior codegens for varialble IV step? > > I think AArch64 was sensitive to this because (a) the vectoriser creates > separate IVs for each base address and (b) for SVE, we instead want > invariant base addresses that are indexed by the loop control IV. > Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't > able to use it and so (b) fails. > > Thanks, > Richard > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)