On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/7/13 下午5:35, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> When I added the support for Power10 newly introduced multiply
> >> highpart instrutions, I noticed that currently vectorizer
> >> doesn't try to vectorize multiply highpart pattern, I hope
> >> this isn't intentional?
> >>
> >> This patch is to extend the existing pattern mulhs handlings
> >> to cover multiply highpart.  Another alternative seems to
> >> recog mul_highpart operation in a general place applied for
> >> scalar code when the target supports the optab for the scalar
> >> operation, it's based on the assumption that one target which
> >> supports vector version of multiply highpart should have the
> >> scalar version.  I noticed that the function can_mult_highpart_p
> >> can check/handle mult_highpart well even without mul_highpart
> >> optab support, I think to recog this pattern in vectorizer
> >> is better.  Is it on the right track?
> >
> > I think it's on the right track, using IFN_LAST is a bit awkward
> > in case yet another case pops up so maybe you can use
> > a code_helper instance instead which unifies tree_code,
> > builtin_code and internal_fn?
> >
>
> If there is one new requirement which doesn't have/introduce IFN
> stuffs but have one existing tree_code, can we add one more field
> with type tree_code, then for the IFN_LAST path we can check the
> different requirements under the guard with that tree_code variable?
>
> > I also notice that can_mult_highpart_p will return true if
> > only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> > but then the result might be less optimal (or even not
> > handled later)?
> >
>
> I think it will be handled always?  The expander calls
>
> rtx
> expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
>                       rtx target, bool uns_p)
>
> which will further check with can_mult_highpart_p.
>
> For the below case,
>
> #define SHT_CNT 16
>
> __attribute__ ((noipa)) void
> test ()
> {
>   for (int i = 0; i < N; i++)
>     sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
> }
>
> Without this patch, it use widen_mult like below:
>
>   vect__1.5_19 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.18_24 
> * 1];
>   vect__3.8_14 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.18_24 
> * 1];
>   vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR <vect__3.8_14, vect__1.5_19>;
>   vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR <vect__3.8_14, vect__1.5_19>;
>   vect__6.10_25 = vect_patt_22.9_13 >> 16;
>   vect__6.10_26 = vect_patt_22.9_9 >> 16;
>   vect__7.11_27 = VEC_PACK_TRUNC_EXPR <vect__6.10_25, vect__6.10_26>;
>   MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.18_24 * 1] = 
> vect__7.11_27;
>
> .L2:
>         lxvx 33,7,9
>         lxvx 32,8,9
>         vmulosh 13,0,1    // widen mult
>         vmulesh 0,0,1
>         xxmrglw 33,32,45  // merge
>         xxmrghw 32,32,45
>         vsraw 1,1,12      // shift
>         vsraw 0,0,12
>         vpkuwum 0,0,1     // pack
>         stxvx 32,10,9
>         addi 9,9,16
>         bdnz .L2
>
>
> With this patch, it ends up with:
>
>   vect__1.5_14 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.17_24 
> * 1];
>   vect__3.8_8 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.17_24 * 
> 1];
>   vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
>   MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.17_24 * 1] = 
> vect_patt_21.9_25;

Yes, so I'm curious what it ends up with/without the patch on x86_64 which
can do vec_widen_[us]mult_{even,odd} but not [us]mul_highpart.

Richard.

> .L2:
>         lxvx 32,8,9
>         lxvx 33,10,9
>         vmulosh 13,0,1   // widen mult
>         vmulesh 0,0,1
>         vperm 0,0,13,12  // perm on widen mults
>         stxvx 32,7,9
>         addi 9,9,16
>         bdnz .L2
>
>
> > That is, what about adding optab internal functions
> > for [us]mul_highpart instead, much like the existing
> > ones for MULH{R,}S?
> >
>
> OK, I was thinking the IFN way at the beginning, but was worried
> that it's easy to be blamed saying it's not necessary since there
> is one existing tree_code.  :-)  Will update it with IFN way.
>
> BR,
> Kewen
>
> > Richard.
> >
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu.
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
> >>         recog normal multiply highpart.
> >>
>
>

Reply via email to