http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2012-03-20 Ever Confirmed|0 |1 --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-20 09:17:41 UTC --- Testing the 3 patches now (AVX2 improvements, expand_vselect and #c8 with further comments). For 3/4 insn sequences, I agree with the proposal to attempt to handle d->op0 == d->op1 cross-lane shuffles as two operand in-lane shuffles after vperm2f128 swapping the lanes. Two insn expanders could be groupped into expand_vec_perm_2 and three insn expanders into expand_vec_perm_3. We need to write some further 2 and 3 insn in-lane expanders though, as shown by: typedef double V4DF __attribute__((vector_size (4 * sizeof (double)))); typedef long V4DI __attribute__((vector_size (4 * sizeof (long)))); #define A(a, b, c, d) \ __attribute__((noinline, noclone)) V4DF \ f##a##b##c##d (V4DF x, V4DF y) \ {\ V4DI m = { a, b, c, d }; \ return __builtin_shuffle (x, y, m); \ } #define B(b, c, d) A(0, b, c, d) A(1, b, c, d) A(4, b, c, d) A(5, b, c, d) #define C(c, d) B(0, c, d) B(1, c, d) B(4, c, d) B(5, c, d) #define D(d) C(2, d) C(3, d) C(6, d) C(7, d) #define E D(2) D(3) D(6) D(7) E int main () { V4DF x = { 0.5, 1.5, 2.5, 3.5 }, y = { 4.5, 5.5, 6.5, 7.5 }, z; #undef A #define A(a, b, c, d) \ z = f##a##b##c##d (x, y); \ if (z[0] != a + .5 || z[1] != b + .5 || z[2] != c + .5 || z[3] != d + .5) \ __builtin_abort (); E return 0; }