15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

ubizjak at gmail dot com via Gcc-bugs Fri, 17 May 2024 01:48:26 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069


--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #8)
> A better patch:

The real issue is that the following permutation (truncation):

+      for (i = 0; i < d.nelt; ++i)
+       d.perm[i] = i * 2;
+
+      ok = ix86_expand_vec_perm_const_1 (&d);

results in a slow code involving VPERMQ. Ideally, ix86_expand_vec_perm_const_1
should emit faster code for truncation, because this will benefit other code as
well.

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

Reply via email to