On Wed, Sep 9, 2020 at 3:47 AM luoxhu <luo...@linux.ibm.com> wrote: > > > > On 2020/9/8 16:26, Richard Biener wrote: > >> Seems not only pseudo, for example "v = vec_insert (i, v, n);" > >> the vector variable will be store to stack first, then [r112:DI] is a > >> memory here to be processed. So the patch loads it from stack(insn #10) to > >> temp vector register first, and store to stack again(insn #24) after > >> rs6000_vector_set_var. > > Hmm, yeah - I guess that's what should be addressed first then. > > I'm quite sure that in case 'v' is not on the stack but in memory like > > in my case a SImode store is better than what we get from > > vec_insert - in fact vec_insert will likely introduce a RMW cycle > > which is prone to inserting store-data-races? > > Yes, for your case, there is no stack operation and to_rtx is expanded > with BLKmode instead of V4SImode. Add the to_rtx mode check could workaround > it. ASM doesn't show store hit load issue. > > optimized: > > _1 = i_2(D) % 4; > VIEW_CONVERT_EXPR<int[4]>(x.u)[_1] = a_4(D); > > expand: > 2: r118:DI=%3:DI > 3: r119:DI=%4:DI > 4: NOTE_INSN_FUNCTION_BEG > 7: r120:DI=unspec[`*.LANCHOR0',%2:DI] 47 > REG_EQUAL `*.LANCHOR0' > 8: r122:SI=r118:DI#0 > 9: {r124:SI=r122:SI/0x4;clobber ca:SI;} > 10: r125:SI=r124:SI<<0x2 > 11: r123:SI=r122:SI-r125:SI > REG_EQUAL r122:SI%0x4 > 12: r126:DI=sign_extend(r123:SI) > 13: r127:DI=r126:DI+0x4 > 14: r128:DI=r127:DI<<0x2 > 15: r129:DI=r120:DI+r128:DI > 16: [r129:DI]=r119:DI#0 > > p to_rtx > $319 = (rtx_def *) (mem/c:BLK (reg/f:DI 120) [2 x+0 S32 A128]) > > asm: > addis 2,12,.TOC.-.LCF0@ha > addi 2,2,.TOC.-.LCF0@l > .localentry test,.-test > srawi 9,3,2 > addze 9,9 > addis 10,2,.LANCHOR0@toc@ha > addi 10,10,.LANCHOR0@toc@l > slwi 9,9,2 > subf 9,9,3 > extsw 9,9 > addi 9,9,4 > sldi 9,9,2 > stwx 4,10,9 > blr > > > > > > So - what we need to "fix" is cfgexpand.c marking variably-indexed > > decls as not to be expanded as registers (see > > discover_nonconstant_array_refs). > > > > I guess one way forward would be to perform instruction > > selection on GIMPLE here and transform > > > > VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D) > > > > to a (direct) internal function based on the vec_set optab. > > I don't quite understand what you mean here. Do you mean: > ALTIVEC_BUILTIN_VEC_INSERT -> VIEW_CONVERT_EXPR -> internal function -> > vec_set
You're writing VIEW_CONVERT_EXPR here but the outermost component is an ARRAY_REF. But yes, this is what I meant. > or ALTIVEC_BUILTIN_VEC_INSERT -> internal function -> vec_set? > And which pass to put the selection and transform is acceptable? Close to RTL expansion. There's gimple-isel.cc which does instruction selection for VEC_COND_EXPRs. > Why call it *based on* vec_set optab? The VIEW_CONVERT_EXPR or internal > function > is expanded to vec_set optab. Based on because we have the convenient capability to represent optabs to be used for RTL expansion as internal function calls on GIMPLE, called "direct internal function". > I guess you suggest adding internal function for VIEW_CONVERT_EXPR in gimple, > and do the transform from internal function to vec_set optab in expander? No, I suggest to "add" an internal function for the vec_set optab, see DEF_INTERNAL_OPTAB_FN in internal-fn.def > I doubt my understanding as this looks really over complicated since we > transform from VIEW_CONVERT_EXPR to vec_set optab directly so far... > IIUC, Internal function seems doesn't help much here as Segher said before. The advantage would be to circumvent GIMPLEs forcing of memory here. But as I said here: > > But then in GIMPLE D.3185 is also still memory (we don't have a variable > > index partial register set operation - BIT_INSERT_EXPR is > > currently specified to receive a constant bit position only). it might not work out so easy. Going down the rathole to avoid forcing memory during RTL expansion for select cases (vector type bases with a supported vector mode) might be something to try. That at least would make the approach of dealing with this in expand_assignment or siblings sensible. > > At which point after your patch is the stack storage elided? > > > > Stack storage is elided by register reload pass in RTL. > > > Thanks, > Xionghu