On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi Richi, > > Thanks for the comments! > > on 2021/7/13 下午5:35, Richard Biener wrote: > > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> When I added the support for Power10 newly introduced multiply > >> highpart instrutions, I noticed that currently vectorizer > >> doesn't try to vectorize multiply highpart pattern, I hope > >> this isn't intentional? > >> > >> This patch is to extend the existing pattern mulhs handlings > >> to cover multiply highpart. Another alternative seems to > >> recog mul_highpart operation in a general place applied for > >> scalar code when the target supports the optab for the scalar > >> operation, it's based on the assumption that one target which > >> supports vector version of multiply highpart should have the > >> scalar version. I noticed that the function can_mult_highpart_p > >> can check/handle mult_highpart well even without mul_highpart > >> optab support, I think to recog this pattern in vectorizer > >> is better. Is it on the right track? > > > > I think it's on the right track, using IFN_LAST is a bit awkward > > in case yet another case pops up so maybe you can use > > a code_helper instance instead which unifies tree_code, > > builtin_code and internal_fn? > > > > If there is one new requirement which doesn't have/introduce IFN > stuffs but have one existing tree_code, can we add one more field > with type tree_code, then for the IFN_LAST path we can check the > different requirements under the guard with that tree_code variable? > > > I also notice that can_mult_highpart_p will return true if > > only vec_widen_[us]mult_{even,odd,hi,lo} are available, > > but then the result might be less optimal (or even not > > handled later)? > > > > I think it will be handled always? The expander calls > > rtx > expand_mult_highpart (machine_mode mode, rtx op0, rtx op1, > rtx target, bool uns_p) > > which will further check with can_mult_highpart_p. > > For the below case, > > #define SHT_CNT 16 > > __attribute__ ((noipa)) void > test () > { > for (int i = 0; i < N; i++) > sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16; > } > > Without this patch, it use widen_mult like below: > > vect__1.5_19 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.18_24 > * 1]; > vect__3.8_14 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.18_24 > * 1]; > vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR <vect__3.8_14, vect__1.5_19>; > vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR <vect__3.8_14, vect__1.5_19>; > vect__6.10_25 = vect_patt_22.9_13 >> 16; > vect__6.10_26 = vect_patt_22.9_9 >> 16; > vect__7.11_27 = VEC_PACK_TRUNC_EXPR <vect__6.10_25, vect__6.10_26>; > MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.18_24 * 1] = > vect__7.11_27; > > .L2: > lxvx 33,7,9 > lxvx 32,8,9 > vmulosh 13,0,1 // widen mult > vmulesh 0,0,1 > xxmrglw 33,32,45 // merge > xxmrghw 32,32,45 > vsraw 1,1,12 // shift > vsraw 0,0,12 > vpkuwum 0,0,1 // pack > stxvx 32,10,9 > addi 9,9,16 > bdnz .L2 > > > With this patch, it ends up with: > > vect__1.5_14 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.17_24 > * 1]; > vect__3.8_8 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.17_24 * > 1]; > vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14; > MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.17_24 * 1] = > vect_patt_21.9_25;
Yes, so I'm curious what it ends up with/without the patch on x86_64 which can do vec_widen_[us]mult_{even,odd} but not [us]mul_highpart. Richard. > .L2: > lxvx 32,8,9 > lxvx 33,10,9 > vmulosh 13,0,1 // widen mult > vmulesh 0,0,1 > vperm 0,0,13,12 // perm on widen mults > stxvx 32,7,9 > addi 9,9,16 > bdnz .L2 > > > > That is, what about adding optab internal functions > > for [us]mul_highpart instead, much like the existing > > ones for MULH{R,}S? > > > > OK, I was thinking the IFN way at the beginning, but was worried > that it's easy to be blamed saying it's not necessary since there > is one existing tree_code. :-) Will update it with IFN way. > > BR, > Kewen > > > Richard. > > > >> > >> Bootstrapped & regtested on powerpc64le-linux-gnu P9, > >> x86_64-redhat-linux and aarch64-linux-gnu. > >> > >> BR, > >> Kewen > >> ----- > >> gcc/ChangeLog: > >> > >> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to > >> recog normal multiply highpart. > >> > >