On Fri, Oct 3, 2014 at 1:11 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Thu, Oct 02, 2014 at 08:34:40PM +0200, Uros Bizjak wrote: >> Index: i386.c >> =================================================================== >> --- i386.c (revision 215802) >> +++ i386.c (working copy) >> @@ -43407,8 +43407,10 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d >> AVX and AVX2 as they require more than 2 instructions. */ >> if (d->one_operand_p) >> return false; >> - if (TARGET_SSE4_1 && GET_MODE_SIZE (vmode) == 16) >> + if (TARGET_AVX2 && GET_MODE_SIZE (vmode) == 32) >> ; >> + else if (TARGET_SSE4_1 && GET_MODE_SIZE (vmode) == 16) >> + ; >> else >> return false; >> >> --cut here-- >> >> The comment above expand_vec_perm_pblendv claims that: >> >> /* Use the same checks as in expand_vec_perm_blend, but skipping >> AVX and AVX2 as they require more than 2 instructions. */ > > The comment is mostly right though, I'd say "as they sometimes require > more than 2 instructions". > >> BTW: I have no access to avx2 target, so I can't test the patch with a >> runtime tests. OTOH, it doesn't ICE for "GCC_TEST_RUN_EXPENSIVE=1 make >> check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 >> dg-torture.exp=vshuf*.c'". > > Even the expensive testsuite has very limited coverage. > As I wanted to prove your patch will ICE, I wrote following generator: > > #ifndef ITYPE > #define ITYPE TYPE > #endif > #define S2(X) #X > #define S(X) S2(X) > > int > main () > { > int i, j, nelt = 32 / sizeof (TYPE); > printf ( > "typedef " S(TYPE) " V __attribute__ ((vector_size (32)));\n" > "typedef " S(ITYPE) " VI __attribute__ ((vector_size (32)));\n" > "V a, b, c;\n" > "\n" > "#define T(n, m...) void foo##n (void) { c = __builtin_shuffle (a, b, (VI) > m); }\n" > "#define S(n, m...) T(n, m)\n"); > for (i = 0; i < 100000; i++) > { > printf ("S (__LINE__, { "); > for (j = 0; j < nelt; j++) > { > int k = random () & 3; > int v = j; > if (k & 1) > v = ((k & 2) ? nelt : 0) + (random () & (nelt - 1)); > printf ("%d%s", v, j < (nelt - 1) ? ", " : " })\n"); > } > } > } > > which can be compiled e.g. with > -DTYPE=char > -DTYPE=short > -DTYPE=int > -DTYPE=long > -DTYPE=float -DITYPE=int > -DTYPE=double -DITYPE=long > and then in each case generate 100000 tests (sort -u on it plus manual fixup > can decrease that, for the V4DI/V4DF cases substantially). The first one > triggered almost immediately an ICE, added to vshuf-32.inc (non-expensive). > > With the following updated patch all those generated testcases don't ICE > (-mavx2 for the first four, -mavx for the last two). > > Also tested with > GCC_TEST_RUN_EXPENSIVE=1 make check-gcc > RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
I have had some problems testing with TARGET_AVX part of the change and /-mavx tests. I assume that your patch survives these tests. > > The pr61403.c testcase can be simplified into: > typedef float V __attribute__ ((vector_size (32))); > typedef int VI __attribute__ ((vector_size (32))); > V a, b, c; > > #define T(n, m...) void foo##n (void) { c = __builtin_shuffle (a, b, (VI) m); > } > T (0, { 0, 1, 2, 3, 4, 5, 10, 13 }) > T (1, { 0, 1, 2, 3, 4, 8, 11, 14 }) > T (2, { 0, 1, 2, 3, 4, 9, 12, 15 }) > T (3, { 0, 13, 2, 3, 14, 5, 6, 15 }) > T (4, { 0, 1, 8, 3, 4, 9, 6, 7 }) > T (5, { 0, 3, 11, 0, 4, 12, 0, 5 }) > T (6, { 0, 3, 6, 9, 12, 15, 0, 0 }) > T (7, { 0, 8, 0, 1, 9, 0, 2, 10 }) > T (8, { 10, 1, 2, 11, 4, 5, 12, 7 }) > T (9, { 13, 0, 6, 14, 0, 7, 15, 0 }) > T (10, { 1, 4, 7, 10, 13, 0, 0, 0 }) > T (11, { 2, 5, 8, 11, 14, 0, 0, 0 }) > permutations, where both your and my patch optimize > foo{0,1,2,3,4,8}. > > 2014-10-03 Jakub Jelinek <ja...@redhat.com> > Uros Bizjak <ubiz...@gmail.com> > > PR tree-optimization/61403 > * config/i386/i386.c (expand_vec_perm_palignr): Fix a spelling > error in comment. Also optimize 256-bit vectors for AVX2 > or AVX (floating vectors only), provided the first permutation > can be performed in one insn. > > * gcc.dg/torture/vshuf-32.inc: Add a new test 29. OK if the patch bootstraps and regtests without problems. Thanks, Uros.