[Bug tree-optimization/109764] V2SI multiply high is not vectorized on x86_64

ubizjak at gmail dot com via Gcc-bugs Mon, 08 May 2023 01:09:52 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109764


--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> Confirmed.  Pattern recog recognizes the widening multiplication but not a
> highpart multiplication.  That's currently missing.

Please note that the following testcase that multiplies short -> int:

--cut here--
#define N 2

unsigned short ur[N], ua[N], ub[N];

void mulh (void)
{
  int i;

  for (i = 0; i < N; i++)
    ur[i] = ((unsigned int) ua[i] * ub[i]) >> 16;
}

void mulh_slp (void)
{
  ur[0] = ((unsigned int) ua[0] * ub[0]) >> 16;
  ur[1] = ((unsigned int) ua[1] * ub[1]) >> 16;
}
--cut here--

vectorizes with -O2 -fno-vec-cost-model via .MULH:

  vect__15.6_1 = MEM <vector(2) short unsigned int> [(short unsigned int
*)&ua];
  vect__17.9_3 = MEM <vector(2) short unsigned int> [(short unsigned int
*)&ub];
  vect_patt_34.10_5 = .MULH (vect__15.6_1, vect__17.9_3);
  MEM <vector(2) short unsigned int> [(short unsigned int *)&ur] =
vect_patt_34.10_5;

and generates expected:

        movd    ua(%rip), %xmm0
        movd    ub(%rip), %xmm1
        pmulhuw %xmm1, %xmm0
        movd    %xmm0, ur(%rip)

in both cases.

[Bug tree-optimization/109764] V2SI multiply high is not vectorized on x86_64

Reply via email to