https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563

Cory Fields <lists at coryfields dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lists at coryfields dot com

--- Comment #13 from Cory Fields <lists at coryfields dot com> ---
Chiming in to say I'm seeing the exact same thing on trunk.

Here's a minimal reproducer:

using vec256 = unsigned __attribute__((__vector_size__(32)));

void slow_rotate(vec256& x)
{
    x = __builtin_shufflevector(x, x, 3, 0, 1, 2, 7, 4, 5, 6);
}

void fast_rotate(vec256& x)
{
    x = vec256{x[3], x[0], x[1], x[2], x[7], x[4], x[5], x[6]};
}

Godbolt link: https://godbolt.org/z/YY9P7xKbh

fast_rotate generates pshufd as expected on x86_64 with generic compilation
flags. slow_rotate is *much* slower.

Reply via email to