Hi Kelvin, On Fri, Mar 08, 2019 at 10:59:35AM -0600, Kelvin Nilsen wrote: > This problem report, though initially motivated by differences in behavior > between constant and non-constant selector arguments, uncovered a number of > inconsistencies in the implementation of vec_extract. > > This patch provides several fixes to make handling of constant selector > expressions the same as the handling of non-constant selector expressions. > In the process of testing, it was observed that certain existing regression > tests were looking for the wrong instructions to be emitted and those tests > have been updated. > > This has bootstrapped and tested without regressions on > powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 > big-endian, with both -m32 and -m64 target options).
Thanks for the patch. I have lots of comments/questions, mostly about the testcases. Sorry :-/ The actual compiler code part looks good though. > --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c (revision 0) > @@ -0,0 +1,157 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ Is there any reason to think this testcase will fail on Darwin? I mean, it requires VSX, and that should skip the testcase already on Darwin? > +/* { dg-require-effective-target dfp_hw } */ Why is that? I don't see any dfp used here? > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } > */ You do not use -mcpu=, so why this dg-skip-if? And please set -mdejagnu-cpu= instead. > --- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c > (revision 269370) > +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c > (working copy) > @@ -18,7 +18,7 @@ > /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */ > -/* { dg-final { scan-assembler-times {\msradi\M} 3 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */ Maybe allow both? So {\msra?di\M}? Or does sradi make no sense here? > --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-14a.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-14a.c (revision 0) > @@ -0,0 +1,128 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target dfp_hw } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } > */ > +/* { dg-options "-maltivec" } */ > + > +/* This test should run the same on any target that supports altivec/dfp > + instructions. Intentionally not specifying cpu in order to test > + all code generation paths. */ I don't see where dfp comes in? For server CPUs it is p6 and up, the same as AltiVec, but that is coincidental. > --- gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 0) > @@ -0,0 +1,260 @@ > +/* { dg-do run { target { powerpc*-*-* && lp64 } } } */ Check for int128, instead? Or is there another reason to want lp64? > + __asm__ (" # %x0" : "+wa" (a)); "wa" requires VSX. > --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-14b.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-14b.c (revision 0) > @@ -0,0 +1,128 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ Btw, you can just say { dg-do run }: everything in gcc.target/powerpc is only run for powerpc*-*-* targets. > --- gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2b.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2b.c (revision 0) > @@ -0,0 +1,16 @@ > +/* { dg-do run { target { powerpc*-*-linux* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target vsx_hw } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } > */ > +/* { dg-options "-O2 -mvsx -save-temps -dp -g" } */ I think that is debugging code left over? -dp -g should be harmless, but -save-temps is littering ;-) > --- gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h (revision 0) > +static vector TYPE > +deoptimize (vector TYPE a) > +{ > + __asm__ (" # %x0" : "+wa" (a)); > + return a; > +} Is this only ever compiled with VSX? If not, use "v" instead? > --- gcc/config/rs6000/rs6000-c.c (revision 269370) > +++ gcc/config/rs6000/rs6000-c.c (working copy) > @@ -6568,9 +6568,15 @@ > /* If the second argument is an integer constant, if the value is in > the expected range, generate the built-in code if we can. We need > 64-bit and direct move to extract the small integer vectors. */ > - if (TREE_CODE (arg2) == INTEGER_CST > - && wi::ltu_p (wi::to_wide (arg2), nunits)) > + > + if (TREE_CODE (arg2) == INTEGER_CST) > { > + if (!wi::ltu_p (wi::to_wide (arg2), nunits)) > + { > + wide_int selector = wi::to_wide (arg2); > + selector = wi::umod_trunc (selector, nunits); > + arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); > + } You can just always do this, no need for the condition? Makes it easier to read. > --- gcc/config/rs6000/rs6000.c (revision 269370) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -6894,7 +6894,6 @@ > default: > break; > case E_V1TImode: > - gcc_assert (INTVAL (elt) == 0 && inner_mode == TImode); You could leave the inner_mode part? Or is the assert here pretty useless anyway? > @@ -7173,7 +7190,9 @@ > > else if (REG_P (src) || SUBREG_P (src)) > { > - int bit_shift = byte_shift + 3; > + int num_elements = GET_MODE_NUNITS (mode); > + int bits_in_element = 128 / num_elements; Use GET_MODE_INNER? Or will that not help here. > @@ -14724,8 +14743,18 @@ > op1 = expand_normal (arg1); > > /* Call get_element_number to validate arg1 if it is a constant. */ > + > if (TREE_CODE (arg1) == INTEGER_CST) > - (void) get_element_number (TREE_TYPE (arg0), arg1); > + { > + unsigned HOST_WIDE_INT elt; > + unsigned HOST_WIDE_INT size = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + unsigned int truncated_selector; > + if (!tree_fits_uhwi_p (arg1)) > + error ("selector must fit in HOST_WIDE_INT"); "error" is a user-visible error, and HOST_WIDE_INT is not something a user knows. Segher