Hi, This is a revision of my earlier patch on the subject, expanded to catch a few more cases and with some attendant test-case adjustments:
In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. This requires adjusting the altivec-perm-1.c test case. The vector pack tests are moved to a new altivec-perm-3.c test, which is restricted to big-endian targets. These changes fix 49 failures in the test suite for little endian mode (9 vector failures left to go!). Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new failures. Is this ok for trunk? Thanks, Bill gcc: 2013-10-21 Bill Schmidt <wschm...@vnet.ibm.com> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse meaning of merge-high and merge-low masks for little endian; avoid use of vector-pack masks for little endian for mismatched modes. gcc/testsuite: 2013-10-21 Bill Schmidt <wschm...@vnet.ibm.com> * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack tests into... * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is restricted to big-endian targets. Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ +/* { dg-options "-O -maltivec -mno-vsx" } */ + +typedef unsigned char V __attribute__((vector_size(16))); + +V p2(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); + +} + +V p4(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); +} + +/* { dg-final { scan-assembler-not "vperm" } } */ +/* { dg-final { scan-assembler "vpkuhum" } } */ +/* { dg-final { scan-assembler "vpkuwum" } } */ Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c (revision 203792) +++ gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c (working copy) @@ -19,19 +19,6 @@ V b4(V x) return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, }); } -V p2(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); - -} - -V p4(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); -} - V h1(V x, V y) { return __builtin_shuffle(x, y, @@ -72,5 +59,3 @@ V l4(V x, V y) /* { dg-final { scan-assembler "vspltb" } } */ /* { dg-final { scan-assembler "vsplth" } } */ /* { dg-final { scan-assembler "vspltw" } } */ -/* { dg-final { scan-assembler "vpkuhum" } } */ -/* { dg-final { scan-assembler "vpkuwum" } } */ Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28980,6 +28987,26 @@ altivec_expand_vec_perm_const (rtx operands[4]) enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; + /* For little-endian, don't use vpkuwum and vpkuhum if the + underlying vector type is not V4SI and V8HI, respectively. + For example, using vpkuwum with a V8HI picks up the even + halfwords (BE numbering) when the even halfwords (LE + numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuwum + && ((GET_CODE (op0) == REG + && GET_MODE (op0) != V4SImode) + || (GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V4SImode))) + continue; + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuhum + && ((GET_CODE (op0) == REG + && GET_MODE (op0) != V8HImode) + || (GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V8HImode))) + continue; + /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */