https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
Cory Fields <lists at coryfields dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lists at coryfields dot com
--- Comment #13 from Cory Fields <lists at coryfields dot com> ---
Chiming in to say I'm seeing the exact same thing on trunk.
Here's a minimal reproducer:
using vec256 = unsigned __attribute__((__vector_size__(32)));
void slow_rotate(vec256& x)
{
x = __builtin_shufflevector(x, x, 3, 0, 1, 2, 7, 4, 5, 6);
}
void fast_rotate(vec256& x)
{
x = vec256{x[3], x[0], x[1], x[2], x[7], x[4], x[5], x[6]};
}
Godbolt link: https://godbolt.org/z/YY9P7xKbh
fast_rotate generates pshufd as expected on x86_64 with generic compilation
flags. slow_rotate is *much* slower.