Re: [PATCH] Add MULT_HIGHPART_EXPR

Jakub Jelinek Thu, 28 Jun 2012 09:21:01 -0700

On Thu, Jun 28, 2012 at 08:57:23AM -0700, Richard Henderson wrote:
> On 2012-06-28 07:05, Jakub Jelinek wrote:
> > Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems
> > to pessimize the generated code for gcc.dg/vect/pr51581-3.c
> > testcase (at least with -O3 -mavx) compared to when the hooks aren't
> > present, because i?86 has more natural support for widen mult lo/hi
> > compoared to widen mult even/odd, but I assume that on powerpc it is the
> > other way around.  So, how should I find out if both VEC_WIDEN_MULT_*_EXPR
> > and builtin_mul_widen_* are possible for the particular vectype which one
> > will be cheaper?
> 
> I would assume that if the builtin exists, then it is cheaper.
> 
> I disagree about "x86 has more natural support for hi/lo".  The basic sse2
> multiplication is even.  One shift per input is needed to generate odd. 
> On the other hand, one interleave per input is required for both hi/lo. 
> So 4 setup insns for hi/lo, and 2 setup insns for even/odd.  And on top of
> all that, XOP includes multiply odd at least for signed V4SI.


Perhaps the problem is then that the permutation is much more expensive
for even/odd.  With even/odd the f2 routine is:
        vmovdqa d(%rip), %xmm2
        vmovdqa .LC1(%rip), %xmm0
        vpsrlq  $32, %xmm2, %xmm4
        vmovdqa d+16(%rip), %xmm1
        vpmuludq        %xmm0, %xmm2, %xmm5
        vpsrlq  $32, %xmm0, %xmm3
        vpmuludq        %xmm3, %xmm4, %xmm4
        vpmuludq        %xmm0, %xmm1, %xmm0
        vmovdqa .LC2(%rip), %xmm2
        vpsrlq  $32, %xmm1, %xmm1
        vpmuludq        %xmm3, %xmm1, %xmm3
        vmovdqa .LC3(%rip), %xmm1
        vpshufb %xmm2, %xmm5, %xmm5
        vpshufb %xmm1, %xmm4, %xmm4
        vpshufb %xmm2, %xmm0, %xmm2
        vpshufb %xmm1, %xmm3, %xmm1
        vpor    %xmm4, %xmm5, %xmm4
        vpor    %xmm1, %xmm2, %xmm1
        vpsrld  $1, %xmm4, %xmm4
        vmovdqa %xmm4, c(%rip)
        vpsrld  $1, %xmm1, %xmm1
        vmovdqa %xmm1, c+16(%rip)
        ret
and with lo/hi it is:
        vmovdqa d(%rip), %xmm2
        vpunpckhdq      %xmm2, %xmm2, %xmm3
        vpunpckldq      %xmm2, %xmm2, %xmm2
        vmovdqa .LC1(%rip), %xmm0
        vpmuludq        %xmm0, %xmm3, %xmm3
        vmovdqa d+16(%rip), %xmm1
        vpmuludq        %xmm0, %xmm2, %xmm2
        vshufps $221, %xmm2, %xmm3, %xmm2
        vpsrld  $1, %xmm2, %xmm2
        vmovdqa %xmm2, c(%rip)
        vpunpckhdq      %xmm1, %xmm1, %xmm2
        vpunpckldq      %xmm1, %xmm1, %xmm1
        vpmuludq        %xmm0, %xmm2, %xmm2
        vpmuludq        %xmm0, %xmm1, %xmm0
        vshufps $221, %xmm0, %xmm2, %xmm0
        vpsrld  $1, %xmm0, %xmm0
        vmovdqa %xmm0, c+16(%rip)
        ret

        Jakub

Re: [PATCH] Add MULT_HIGHPART_EXPR

Reply via email to