I applied the following patches that were applied to the trunk to the GCC 6.2 branch:
[gcc] 2016-06-01 Michael Meissner <meiss...@linux.vnet.ibm.com> Back port from trunk 2016-05-23 Michael Meissner <meiss...@linux.vnet.ibm.com> PR target/71201 * config/rs6000/altivec.md (altivec_vperm_<mode>_internal): Drop ISA 3.0 xxperm fusion alternative. (altivec_vperm_v8hiv16qi): Likewise. (altivec_vperm_<mode>_uns_internal): Likewise. (vperm_v8hiv4si): Likewise. (vperm_v16qiv8hi): Likewise. Back port from trunk 2016-05-23 Michael Meissner <meiss...@linux.vnet.ibm.com> Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate vpermr/xxpermr on ISA 3.0. (altivec_expand_vec_perm_le): Likewise. * config/rs6000/altivec.md (UNSPEC_VPERMR): New unspec. (altivec_vpermr_<mode>_internal): Add VPERMR/XXPERMR support for ISA 3.0. [gcc/testsuite] 2016-06-01 Michael Meissner <meiss...@linux.vnet.ibm.com> Back port from trunk 2016-05-23 Michael Meissner <meiss...@linux.vnet.ibm.com> Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/p9-permute.c: Run test on big endian as well as little endian. Back port from trunk 2016-05-23 Michael Meissner <meiss...@linux.vnet.ibm.com> Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/p9-vpermr.c: New test for ISA 3.0 vpermr support. [gcc] 2016-06-01 Michael Meissner <meiss...@linux.vnet.ibm.com> Back port from trunk 2016-05-24 Michael Meissner <meiss...@linux.vnet.ibm.com> * config/rs6000/altivec.md (VParity): New mode iterator for vector parity built-in functions. (p9v_ctz<mode>2): Add support for ISA 3.0 vector count trailing zeros. (p9v_parity<mode>2): Likewise. * config/rs6000/vector.md (VEC_IP): New mode iterator for vector parity. (ctz<mode>2): ISA 3.0 expander for vector count trailing zeros. (parity<mode>2): ISA 3.0 expander for vector parity. * config/rs6000/rs6000-builtin.def (BU_P9_MISC_1): New macros for power9 built-ins. (BU_P9_64BIT_MISC_0): Likewise. (BU_P9_MISC_0): Likewise. (BU_P9V_AV_1): Likewise. (BU_P9V_AV_2): Likewise. (BU_P9V_AV_3): Likewise. (BU_P9V_AV_P): Likewise. (BU_P9V_VSX_1): Likewise. (BU_P9V_OVERLOAD_1): Likewise. (BU_P9V_OVERLOAD_2): Likewise. (BU_P9V_OVERLOAD_3): Likewise. (VCTZB): Add vector count trailing zeros support. (VCTZH): Likewise. (VCTZW): Likewise. (VCTZD): Likewise. (VPRTYBD): Add vector parity support. (VPRTYBQ): Likewise. (VPRTYBW): Likewise. (VCTZ): Add overloaded vector count trailing zeros support. (VPRTYB): Add overloaded vector parity support. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add overloaded vector count trailing zeros and parity instructions. * config/rs6000/rs6000.md (wd mode attribute): Add V1TI and TI for vector parity support. * config/rs6000/altivec.h (vec_vctz): Add ISA 3.0 vector count trailing zeros support. (vec_cntlz): Likewise. (vec_vctzb): Likewise. (vec_vctzd): Likewise. (vec_vctzh): Likewise. (vec_vctzw): Likewise. (vec_vprtyb): Add ISA 3.0 vector parity support. (vec_vprtybd): Likewise. (vec_vprtybw): Likewise. (vec_vprtybq): Likewise. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Document the ISA 3.0 vector count trailing zeros and vector parity built-in functions. [gcc/testsuite] 2016-06-01 Michael Meissner <meiss...@linux.vnet.ibm.com> Back port from trunk 2016-05-24 Michael Meissner <meiss...@linux.vnet.ibm.com> * gcc.target/powerpc/p9-vparity.c: New file to check ISA 3.0 vector parity built-in functions. * gcc.target/powerpc/ctz-3.c: New file to check ISA 3.0 vector count trailing zeros automatic vectorization. * gcc.target/powerpc/ctz-4.c: New file to check ISA 3.0 vector count trailing zeros built-in functions. [gcc] 2016-06-01 Michael Meissner <meiss...@linux.vnet.ibm.com> Back port from trunk 2016-05-24 Michael Meissner <meiss...@linux.vnet.ibm.com> * config/rs6000/altivec.md (VNEG iterator): New iterator for VNEGW/VNEGD instructions. (p9_neg<mode>2): New insns for ISA 3.0 VNEGW/VNEGD. (neg<mode>2): Add expander for V2DImode added in ISA 2.07, and support for ISA 3.0 VNEGW/VNEGD instructions. [gcc/testsuite] 2016-06-01 Michael Meissner <meiss...@linux.vnet.ibm.com> Back port from trunk 2016-05-24 Michael Meissner <meiss...@linux.vnet.ibm.com> * gcc.target/powerpc/p9-vneg.c: New test for ISA 3.0 VNEGW/VNEGD instructions. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 236958) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6853,21 +6853,29 @@ rs6000_expand_vector_set (rtx target, rt gen_rtvec (3, target, reg, force_reg (V16QImode, x)), UNSPEC_VPERM); - else + else { - /* Invert selector. We prefer to generate VNAND on P8 so - that future fusion opportunities can kick in, but must - generate VNOR elsewhere. */ - rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x)); - rtx iorx = (TARGET_P8_VECTOR - ? gen_rtx_IOR (V16QImode, notx, notx) - : gen_rtx_AND (V16QImode, notx, notx)); - rtx tmp = gen_reg_rtx (V16QImode); - emit_insn (gen_rtx_SET (tmp, iorx)); - - /* Permute with operands reversed and adjusted selector. */ - x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp), - UNSPEC_VPERM); + if (TARGET_P9_VECTOR) + x = gen_rtx_UNSPEC (mode, + gen_rtvec (3, target, reg, + force_reg (V16QImode, x)), + UNSPEC_VPERMR); + else + { + /* Invert selector. We prefer to generate VNAND on P8 so + that future fusion opportunities can kick in, but must + generate VNOR elsewhere. */ + rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x)); + rtx iorx = (TARGET_P8_VECTOR + ? gen_rtx_IOR (V16QImode, notx, notx) + : gen_rtx_AND (V16QImode, notx, notx)); + rtx tmp = gen_reg_rtx (V16QImode); + emit_insn (gen_rtx_SET (tmp, iorx)); + + /* Permute with operands reversed and adjusted selector. */ + x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp), + UNSPEC_VPERM); + } } emit_insn (gen_rtx_SET (target, x)); @@ -34128,17 +34136,25 @@ altivec_expand_vec_perm_le (rtx operands if (!REG_P (target)) tmp = gen_reg_rtx (mode); - /* Invert the selector with a VNAND if available, else a VNOR. - The VNAND is preferred for future fusion opportunities. */ - notx = gen_rtx_NOT (V16QImode, sel); - iorx = (TARGET_P8_VECTOR - ? gen_rtx_IOR (V16QImode, notx, notx) - : gen_rtx_AND (V16QImode, notx, notx)); - emit_insn (gen_rtx_SET (norreg, iorx)); + if (TARGET_P9_VECTOR) + { + unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel), + UNSPEC_VPERMR); + } + else + { + /* Invert the selector with a VNAND if available, else a VNOR. + The VNAND is preferred for future fusion opportunities. */ + notx = gen_rtx_NOT (V16QImode, sel); + iorx = (TARGET_P8_VECTOR + ? gen_rtx_IOR (V16QImode, notx, notx) + : gen_rtx_AND (V16QImode, notx, notx)); + emit_insn (gen_rtx_SET (norreg, iorx)); - /* Permute with operands reversed and adjusted selector. */ - unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg), - UNSPEC_VPERM); + /* Permute with operands reversed and adjusted selector. */ + unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg), + UNSPEC_VPERM); + } /* Copy into target, possibly by way of a register. */ if (!REG_P (target)) Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 236941) +++ gcc/config/rs6000/altivec.md (working copy) @@ -58,6 +58,7 @@ (define_c_enum "unspec" UNSPEC_VSUM2SWS UNSPEC_VSUMSWS UNSPEC_VPERM + UNSPEC_VPERMR UNSPEC_VPERM_UNS UNSPEC_VRFIN UNSPEC_VCFUX @@ -1949,32 +1950,30 @@ (define_expand "altivec_vperm_<mode>" ;; Slightly prefer vperm, since the target does not overlap the source (define_insn "*altivec_vperm_<mode>_internal" - [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo") - (unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo") - (match_operand:VM 2 "register_operand" "v,wo,wo") - (match_operand:V16QI 3 "register_operand" "v,wo,wo")] + [(set (match_operand:VM 0 "register_operand" "=v,?wo") + (unspec:VM [(match_operand:VM 1 "register_operand" "v,0") + (match_operand:VM 2 "register_operand" "v,wo") + (match_operand:V16QI 3 "register_operand" "v,wo")] UNSPEC_VPERM))] "TARGET_ALTIVEC" "@ vperm %0,%1,%2,%3 - xxperm %x0,%x2,%x3 - xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + xxperm %x0,%x2,%x3" [(set_attr "type" "vecperm") - (set_attr "length" "4,4,8")]) + (set_attr "length" "4")]) (define_insn "altivec_vperm_v8hiv16qi" - [(set (match_operand:V16QI 0 "register_operand" "=v,?wo,?&wo") - (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0,wo") - (match_operand:V8HI 2 "register_operand" "v,wo,wo") - (match_operand:V16QI 3 "register_operand" "v,wo,wo")] + [(set (match_operand:V16QI 0 "register_operand" "=v,?wo") + (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0") + (match_operand:V8HI 2 "register_operand" "v,wo") + (match_operand:V16QI 3 "register_operand" "v,wo")] UNSPEC_VPERM))] "TARGET_ALTIVEC" "@ vperm %0,%1,%2,%3 - xxperm %x0,%x2,%x3 - xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + xxperm %x0,%x2,%x3" [(set_attr "type" "vecperm") - (set_attr "length" "4,4,8")]) + (set_attr "length" "4")]) (define_expand "altivec_vperm_<mode>_uns" [(set (match_operand:VM 0 "register_operand" "") @@ -1992,18 +1991,17 @@ (define_expand "altivec_vperm_<mode>_uns }) (define_insn "*altivec_vperm_<mode>_uns_internal" - [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo") - (unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo") - (match_operand:VM 2 "register_operand" "v,wo,wo") - (match_operand:V16QI 3 "register_operand" "v,wo,wo")] + [(set (match_operand:VM 0 "register_operand" "=v,?wo") + (unspec:VM [(match_operand:VM 1 "register_operand" "v,0") + (match_operand:VM 2 "register_operand" "v,wo") + (match_operand:V16QI 3 "register_operand" "v,wo")] UNSPEC_VPERM_UNS))] "TARGET_ALTIVEC" "@ vperm %0,%1,%2,%3 - xxperm %x0,%x2,%x3 - xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + xxperm %x0,%x2,%x3" [(set_attr "type" "vecperm") - (set_attr "length" "4,4,8")]) + (set_attr "length" "4")]) (define_expand "vec_permv16qi" [(set (match_operand:V16QI 0 "register_operand" "") @@ -2032,6 +2030,19 @@ (define_expand "vec_perm_constv16qi" FAIL; }) +(define_insn "*altivec_vpermr_<mode>_internal" + [(set (match_operand:VM 0 "register_operand" "=v,?wo") + (unspec:VM [(match_operand:VM 1 "register_operand" "v,0") + (match_operand:VM 2 "register_operand" "v,wo") + (match_operand:V16QI 3 "register_operand" "v,wo")] + UNSPEC_VPERMR))] + "TARGET_P9_VECTOR" + "@ + vpermr %0,%1,%2,%3 + xxpermr %x0,%x2,%x3" + [(set_attr "type" "vecperm") + (set_attr "length" "4")]) + (define_insn "altivec_vrfip" ; ceil [(set (match_operand:V4SF 0 "register_operand" "=v") (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")] @@ -2791,32 +2802,30 @@ (define_expand "vec_unpacks_lo_<VP_small "") (define_insn "vperm_v8hiv4si" - [(set (match_operand:V4SI 0 "register_operand" "=v,?wo,?&wo") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v,0,wo") - (match_operand:V4SI 2 "register_operand" "v,wo,wo") - (match_operand:V16QI 3 "register_operand" "v,wo,wo")] + [(set (match_operand:V4SI 0 "register_operand" "=v,?wo") + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v,0") + (match_operand:V4SI 2 "register_operand" "v,wo") + (match_operand:V16QI 3 "register_operand" "v,wo")] UNSPEC_VPERMSI))] "TARGET_ALTIVEC" "@ vperm %0,%1,%2,%3 - xxperm %x0,%x2,%x3 - xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + xxperm %x0,%x2,%x3" [(set_attr "type" "vecperm") - (set_attr "length" "4,4,8")]) + (set_attr "length" "4")]) (define_insn "vperm_v16qiv8hi" - [(set (match_operand:V8HI 0 "register_operand" "=v,?wo,?&wo") - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v,0,wo") - (match_operand:V8HI 2 "register_operand" "v,wo,wo") - (match_operand:V16QI 3 "register_operand" "v,wo,wo")] + [(set (match_operand:V8HI 0 "register_operand" "=v,?wo") + (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v,0") + (match_operand:V8HI 2 "register_operand" "v,wo") + (match_operand:V16QI 3 "register_operand" "v,wo")] UNSPEC_VPERMHI))] "TARGET_ALTIVEC" "@ vperm %0,%1,%2,%3 - xxperm %x0,%x2,%x3 - xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3" + xxperm %x0,%x2,%x3" [(set_attr "type" "vecperm") - (set_attr "length" "4,4,8")]) + (set_attr "length" "4")]) (define_expand "vec_unpacku_hi_v16qi" Index: gcc/testsuite/gcc.target/powerpc/p9-permute.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-permute.c (revision 236941) +++ gcc/testsuite/gcc.target/powerpc/p9-permute.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-do compile { target { powerpc64*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ /* { dg-options "-mcpu=power9 -O2" } */ /* { dg-require-effective-target powerpc_p9vector_ok } */ @@ -17,5 +17,6 @@ permute (vector long long *p, vector lon return vec_perm (a, b, mask); } +/* expect xxpermr on little-endian, xxperm on big-endian */ /* { dg-final { scan-assembler "xxperm" } } */ /* { dg-final { scan-assembler-not "vperm" } } */ Index: gcc/testsuite/gcc.target/powerpc/p9-vpermr.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-vpermr.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-vpermr.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +/* Test generation of VPERMR/XXPERMR on ISA 3.0 in little endian. */ + +#include <altivec.h> + +vector long long +permute (vector long long *p, vector long long *q, vector unsigned char mask) +{ + vector long long a = *p; + vector long long b = *q; + + /* Force a, b to be in altivec registers to select vpermr insn. */ + __asm__ (" # a: %x0, b: %x1" : "+v" (a), "+v" (b)); + + return vec_perm (a, b, mask); +} + +/* { dg-final { scan-assembler "vpermr\|xxpermr" } } */
Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 236941) +++ gcc/config/rs6000/vector.md (working copy) @@ -26,6 +26,13 @@ ;; Vector int modes (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) +;; Vector int modes for parity +(define_mode_iterator VEC_IP [V8HI + V4SI + V2DI + V1TI + (TI "TARGET_VSX_TIMODE")]) + ;; Vector float modes (define_mode_iterator VEC_F [V4SF V2DF]) @@ -738,12 +745,24 @@ (define_expand "clz<mode>2" (clz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))] "TARGET_P8_VECTOR") +;; Vector count trailing zeros +(define_expand "ctz<mode>2" + [(set (match_operand:VEC_I 0 "register_operand" "") + (ctz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))] + "TARGET_P9_VECTOR") + ;; Vector population count (define_expand "popcount<mode>2" [(set (match_operand:VEC_I 0 "register_operand" "") (popcount:VEC_I (match_operand:VEC_I 1 "register_operand" "")))] "TARGET_P8_VECTOR") +;; Vector parity +(define_expand "parity<mode>2" + [(set (match_operand:VEC_IP 0 "register_operand" "") + (parity:VEC_IP (match_operand:VEC_IP 1 "register_operand" "")))] + "TARGET_P9_VECTOR") + ;; Same size conversions (define_expand "float<VEC_int><mode>2" Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (revision 236943) +++ gcc/config/rs6000/rs6000-c.c (working copy) @@ -4215,6 +4215,43 @@ const struct altivec_builtin_types altiv { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZB, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZB, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZH, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZH, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZB, P9V_BUILTIN_VCTZB, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZB, P9V_BUILTIN_VCTZB, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZH, P9V_BUILTIN_VCTZH, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZH, P9V_BUILTIN_VCTZH, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZW, P9V_BUILTIN_VCTZW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZW, P9V_BUILTIN_VCTZW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZD, P9V_BUILTIN_VCTZD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZD, P9V_BUILTIN_VCTZD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD, @@ -4344,6 +4381,42 @@ const struct altivec_builtin_types altiv { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_INTTI, RS6000_BTI_INTTI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_UINTTI, RS6000_BTI_UINTTI, 0, 0 }, + + { P9V_BUILTIN_VEC_VPRTYBW, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBW, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + + { P9V_BUILTIN_VEC_VPRTYBD, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBD, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_INTTI, RS6000_BTI_INTTI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_UINTTI, RS6000_BTI_UINTTI, 0, 0 }, + { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM, RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM, Index: gcc/config/rs6000/rs6000-builtin.def =================================================================== --- gcc/config/rs6000/rs6000-builtin.def (revision 236943) +++ gcc/config/rs6000/rs6000-builtin.def (working copy) @@ -647,8 +647,113 @@ | RS6000_BTC_BINARY), \ CODE_FOR_ ## ICODE) /* ICODE */ + +/* Miscellaneous builtins for instructions added in ISA 3.0. These + instructions don't require either the DFP or VSX options, just the basic + ISA 3.0 enablement since they operate on general purpose registers. */ +#define BU_P9_MISC_1(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_MODULO, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* Miscellaneous builtins for instructions added in ISA 3.0. These + instructions don't require either the DFP or VSX options, just the basic + ISA 3.0 enablement since they operate on general purpose registers, + and they require 64-bit addressing. */ +#define BU_P9_64BIT_MISC_0(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_0 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_MODULO \ + | RS6000_BTM_64BIT, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_SPECIAL), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* Miscellaneous builtins for instructions added in ISA 3.0. These + instructions don't require either the DFP or VSX options, just the basic + ISA 3.0 enablement since they operate on general purpose registers. */ +#define BU_P9_MISC_0(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_0 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_MODULO, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_SPECIAL), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* ISA 3.0 (power9) vector convenience macros. */ +/* For the instructions that are encoded as altivec instructions use + __builtin_altivec_ as the builtin name. */ +#define BU_P9V_AV_1(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_1 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_AV_2(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_2 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_BINARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_AV_3(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_3 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_TERNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_AV_P(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_P (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_PREDICATE), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* For the instructions encoded as VSX instructions use __builtin_vsx as the + builtin name. */ +#define BU_P9V_VSX_1(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_1 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_vsx_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_OVERLOAD_1(ENUM, NAME) \ + RS6000_BUILTIN_1 (P9V_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_OVERLOADED /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_nothing) /* ICODE */ + +#define BU_P9V_OVERLOAD_2(ENUM, NAME) \ + RS6000_BUILTIN_2 (P9V_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_OVERLOADED /* ATTR */ \ + | RS6000_BTC_BINARY), \ + CODE_FOR_nothing) /* ICODE */ + +#define BU_P9V_OVERLOAD_3(ENUM, NAME) \ + RS6000_BUILTIN_3 (P9V_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_OVERLOADED /* ATTR */ \ + | RS6000_BTC_TERNARY), \ + CODE_FOR_nothing) /* ICODE */ #endif + /* Insure 0 is not a legitimate index. */ BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC) @@ -1659,6 +1764,26 @@ BU_LDBL128_2 (UNPACK_TF, "unpack_longdou BU_P7_MISC_2 (PACK_V1TI, "pack_vector_int128", CONST, packv1ti) BU_P7_MISC_2 (UNPACK_V1TI, "unpack_vector_int128", CONST, unpackv1ti) +/* 1 argument vector functions added in ISA 3.0 (power9). */ +BU_P9V_AV_1 (VCTZB, "vctzb", CONST, ctzv16qi2) +BU_P9V_AV_1 (VCTZH, "vctzh", CONST, ctzv8hi2) +BU_P9V_AV_1 (VCTZW, "vctzw", CONST, ctzv4si2) +BU_P9V_AV_1 (VCTZD, "vctzd", CONST, ctzv2di2) +BU_P9V_AV_1 (VPRTYBD, "vprtybd", CONST, parityv2di2) +BU_P9V_AV_1 (VPRTYBQ, "vprtybq", CONST, parityv1ti2) +BU_P9V_AV_1 (VPRTYBW, "vprtybw", CONST, parityv4si2) + +/* ISA 3.0 vector overloaded 1 argument functions. */ +BU_P9V_OVERLOAD_1 (VCTZ, "vctz") +BU_P9V_OVERLOAD_1 (VCTZB, "vctzb") +BU_P9V_OVERLOAD_1 (VCTZH, "vctzh") +BU_P9V_OVERLOAD_1 (VCTZW, "vctzw") +BU_P9V_OVERLOAD_1 (VCTZD, "vctzd") +BU_P9V_OVERLOAD_1 (VPRTYB, "vprtyb") +BU_P9V_OVERLOAD_1 (VPRTYBD, "vprtybd") +BU_P9V_OVERLOAD_1 (VPRTYBQ, "vprtybq") +BU_P9V_OVERLOAD_1 (VPRTYBW, "vprtybw") + /* 1 argument crypto functions. */ BU_CRYPTO_1 (VSBOX, "vsbox", CONST, crypto_vsbox) Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 236959) +++ gcc/config/rs6000/altivec.md (working copy) @@ -190,6 +190,13 @@ (define_mode_iterator VM2 [V4SI (KF "FLOAT128_VECTOR_P (KFmode)") (TF "FLOAT128_VECTOR_P (TFmode)")]) +;; Specific iterator for parity which does not have a byte/half-word form, but +;; does have a quad word form +(define_mode_iterator VParity [V4SI + V2DI + V1TI + (TI "TARGET_VSX_TIMODE")]) + (define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")]) (define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")]) (define_mode_attr VI_unit [(V16QI "VECTOR_UNIT_ALTIVEC_P (V16QImode)") @@ -3362,7 +3369,7 @@ (define_expand "vec_unpacku_float_lo_v8h }") -;; Power8 vector instructions encoded as Altivec instructions +;; Power8/power9 vector instructions encoded as Altivec instructions ;; Vector count leading zeros (define_insn "*p8v_clz<mode>2" @@ -3373,6 +3380,15 @@ (define_insn "*p8v_clz<mode>2" [(set_attr "length" "4") (set_attr "type" "vecsimple")]) +;; Vector count trailing zeros +(define_insn "*p9v_ctz<mode>2" + [(set (match_operand:VI2 0 "register_operand" "=v") + (ctz:VI2 (match_operand:VI2 1 "register_operand" "v")))] + "TARGET_P9_VECTOR" + "vctz<wd> %0,%1" + [(set_attr "length" "4") + (set_attr "type" "vecsimple")]) + ;; Vector population count (define_insn "*p8v_popcount<mode>2" [(set (match_operand:VI2 0 "register_operand" "=v") @@ -3382,6 +3398,15 @@ (define_insn "*p8v_popcount<mode>2" [(set_attr "length" "4") (set_attr "type" "vecsimple")]) +;; Vector parity +(define_insn "*p9v_parity<mode>2" + [(set (match_operand:VParity 0 "register_operand" "=v") + (parity:VParity (match_operand:VParity 1 "register_operand" "v")))] + "TARGET_P9_VECTOR" + "vprtyb<wd> %0,%1" + [(set_attr "length" "4") + (set_attr "type" "vecsimple")]) + ;; Vector Gather Bits by Bytes by Doubleword (define_insn "p8v_vgbbd" [(set (match_operand:V16QI 0 "register_operand" "=v") Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 236943) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -577,7 +577,9 @@ (define_mode_attr wd [(QI "b") (V16QI "b") (V8HI "h") (V4SI "w") - (V2DI "d")]) + (V2DI "d") + (V1TI "q") + (TI "q")]) ;; How many bits in this mode? (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")]) Index: gcc/config/rs6000/altivec.h =================================================================== --- gcc/config/rs6000/altivec.h (revision 236943) +++ gcc/config/rs6000/altivec.h (working copy) @@ -384,6 +384,23 @@ #define vec_vupklsw __builtin_vec_vupklsw #endif +#ifdef _ARCH_PWR9 +/* Vector additions added in ISA 3.0. */ +#define vec_vctz __builtin_vec_vctz +#define vec_cntlz __builtin_vec_vctz +#define vec_vctzb __builtin_vec_vctzb +#define vec_vctzd __builtin_vec_vctzd +#define vec_vctzh __builtin_vec_vctzh +#define vec_vctzw __builtin_vec_vctzw +#define vec_vprtyb __builtin_vec_vprtyb +#define vec_vprtybd __builtin_vec_vprtybd +#define vec_vprtybw __builtin_vec_vprtybw + +#ifdef _ARCH_PPC64 +#define vec_vprtybq __builtin_vec_vprtybq +#endif +#endif + /* Predicates. For C++, we use templates in order to allow non-parenthesized arguments. For C, instead, we use macros since non-parenthesized arguments were Index: gcc/doc/extend.texi =================================================================== --- gcc/doc/extend.texi (revision 236943) +++ gcc/doc/extend.texi (working copy) @@ -16452,6 +16452,60 @@ int __builtin_bcdsub_gt (vector __int128 int __builtin_bcdsub_ov (vector __int128_t, vector__int128_t); @end smallexample +If the ISA 3.00 additions to the vector/scalar (power9-vector) +instruction set are available: + +@smallexample +vector long long vec_vctz (vector long long); +vector unsigned long long vec_vctz (vector unsigned long long); +vector int vec_vctz (vector int); +vector unsigned int vec_vctz (vector int); +vector short vec_vctz (vector short); +vector unsigned short vec_vctz (vector unsigned short); +vector signed char vec_vctz (vector signed char); +vector unsigned char vec_vctz (vector unsigned char); + +vector signed char vec_vctzb (vector signed char); +vector unsigned char vec_vctzb (vector unsigned char); + +vector long long vec_vctzd (vector long long); +vector unsigned long long vec_vctzd (vector unsigned long long); + +vector short vec_vctzh (vector short); +vector unsigned short vec_vctzh (vector unsigned short); + +vector int vec_vctzw (vector int); +vector unsigned int vec_vctzw (vector int); + +vector int vec_vprtyb (vector int); +vector unsigned int vec_vprtyb (vector unsigned int); +vector long long vec_vprtyb (vector long long); +vector unsigned long long vec_vprtyb (vector unsigned long long); + +vector int vec_vprtybw (vector int); +vector unsigned int vec_vprtybw (vector unsigned int); + +vector long long vec_vprtybd (vector long long); +vector unsigned long long vec_vprtybd (vector unsigned long long); +@end smallexample + + +If the ISA 3.00 additions to the vector/scalar (power9-vector) +instruction set are available for 64-bit targets: + +@smallexample +vector long vec_vprtyb (vector long); +vector unsigned long vec_vprtyb (vector unsigned long); +vector __int128_t vec_vprtyb (vector __int128_t); +vector __uint128_t vec_vprtyb (vector __uint128_t); + +vector long vec_vprtybd (vector long); +vector unsigned long vec_vprtybd (vector unsigned long); + +vector __int128_t vec_vprtybq (vector __int128_t); +vector __uint128_t vec_vprtybd (vector __uint128_t); +@end smallexample + If the cryptographic instructions are enabled (@option{-mcrypto} or @option{-mcpu=power8}), the following builtins are enabled. Index: gcc/testsuite/gcc.target/powerpc/p9-vparity.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-vparity.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-vparity.c (revision 0) @@ -0,0 +1,107 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2 -mlra -mvsx-timode" } */ + +#include <altivec.h> + +vector int +parity_v4si_1s (vector int a) +{ + return vec_vprtyb (a); +} + +vector int +parity_v4si_2s (vector int a) +{ + return vec_vprtybw (a); +} + +vector unsigned int +parity_v4si_1u (vector unsigned int a) +{ + return vec_vprtyb (a); +} + +vector unsigned int +parity_v4si_2u (vector unsigned int a) +{ + return vec_vprtybw (a); +} + +vector long long +parity_v2di_1s (vector long long a) +{ + return vec_vprtyb (a); +} + +vector long long +parity_v2di_2s (vector long long a) +{ + return vec_vprtybd (a); +} + +vector unsigned long long +parity_v2di_1u (vector unsigned long long a) +{ + return vec_vprtyb (a); +} + +vector unsigned long long +parity_v2di_2u (vector unsigned long long a) +{ + return vec_vprtybd (a); +} + +vector __int128_t +parity_v1ti_1s (vector __int128_t a) +{ + return vec_vprtyb (a); +} + +vector __int128_t +parity_v1ti_2s (vector __int128_t a) +{ + return vec_vprtybq (a); +} + +__int128_t +parity_ti_3s (__int128_t a) +{ + return vec_vprtyb (a); +} + +__int128_t +parity_ti_4s (__int128_t a) +{ + return vec_vprtybq (a); +} + +vector __uint128_t +parity_v1ti_1u (vector __uint128_t a) +{ + return vec_vprtyb (a); +} + +vector __uint128_t +parity_v1ti_2u (vector __uint128_t a) +{ + return vec_vprtybq (a); +} + +__uint128_t +parity_ti_3u (__uint128_t a) +{ + return vec_vprtyb (a); +} + +__uint128_t +parity_ti_4u (__uint128_t a) +{ + return vec_vprtybq (a); +} + +/* { dg-final { scan-assembler "vprtybd" } } */ +/* { dg-final { scan-assembler "vprtybq" } } */ +/* { dg-final { scan-assembler "vprtybw" } } */ Index: gcc/testsuite/gcc.target/powerpc/ctz-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ctz-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ctz-3.c (revision 0) @@ -0,0 +1,62 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2 -ftree-vectorize -fvect-cost-model=dynamic -fno-unroll-loops -fno-unroll-all-loops" } */ + +#ifndef SIZE +#define SIZE 1024 +#endif + +#ifndef ALIGN +#define ALIGN 32 +#endif + +#define ALIGN_ATTR __attribute__((__aligned__(ALIGN))) + +#define DO_BUILTIN(PREFIX, TYPE, CTZ) \ +TYPE PREFIX ## _a[SIZE] ALIGN_ATTR; \ +TYPE PREFIX ## _b[SIZE] ALIGN_ATTR; \ + \ +void \ +PREFIX ## _ctz (void) \ +{ \ + unsigned long i; \ + \ + for (i = 0; i < SIZE; i++) \ + PREFIX ## _a[i] = CTZ (PREFIX ## _b[i]); \ +} + +#if !defined(DO_LONG_LONG) && !defined(DO_LONG) && !defined(DO_INT) && !defined(DO_SHORT) && !defined(DO_CHAR) +#define DO_INT 1 +#endif + +#if DO_LONG_LONG +/* At the moment, only int is auto vectorized. */ +DO_BUILTIN (sll, long long, __builtin_ctzll) +DO_BUILTIN (ull, unsigned long long, __builtin_ctzll) +#endif + +#if defined(_ARCH_PPC64) && DO_LONG +DO_BUILTIN (sl, long, __builtin_ctzl) +DO_BUILTIN (ul, unsigned long, __builtin_ctzl) +#endif + +#if DO_INT +DO_BUILTIN (si, int, __builtin_ctz) +DO_BUILTIN (ui, unsigned int, __builtin_ctz) +#endif + +#if DO_SHORT +DO_BUILTIN (ss, short, __builtin_ctz) +DO_BUILTIN (us, unsigned short, __builtin_ctz) +#endif + +#if DO_CHAR +DO_BUILTIN (sc, signed char, __builtin_ctz) +DO_BUILTIN (uc, unsigned char, __builtin_ctz) +#endif + +/* { dg-final { scan-assembler-times "vctzw" 2 } } */ +/* { dg-final { scan-assembler-not "cnttzd" } } */ +/* { dg-final { scan-assembler-not "cnttzw" } } */ Index: gcc/testsuite/gcc.target/powerpc/ctz-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ctz-4.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ctz-4.c (revision 0) @@ -0,0 +1,110 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include <altivec.h> + +vector signed char +count_trailing_zeros_v16qi_1s (vector signed char a) +{ + return vec_vctz (a); +} + +vector signed char +count_trailing_zeros_v16qi_2s (vector signed char a) +{ + return vec_vctzb (a); +} + +vector unsigned char +count_trailing_zeros_v16qi_1u (vector unsigned char a) +{ + return vec_vctz (a); +} + +vector unsigned char +count_trailing_zeros_v16qi_2u (vector unsigned char a) +{ + return vec_vctzb (a); +} + +vector short +count_trailing_zeros_v8hi_1s (vector short a) +{ + return vec_vctz (a); +} + +vector short +count_trailing_zeros_v8hi_2s (vector short a) +{ + return vec_vctzh (a); +} + +vector unsigned short +count_trailing_zeros_v8hi_1u (vector unsigned short a) +{ + return vec_vctz (a); +} + +vector unsigned short +count_trailing_zeros_v8hi_2u (vector unsigned short a) +{ + return vec_vctzh (a); +} + +vector int +count_trailing_zeros_v4si_1s (vector int a) +{ + return vec_vctz (a); +} + +vector int +count_trailing_zeros_v4si_2s (vector int a) +{ + return vec_vctzw (a); +} + +vector unsigned int +count_trailing_zeros_v4si_1u (vector unsigned int a) +{ + return vec_vctz (a); +} + +vector unsigned int +count_trailing_zeros_v4si_2u (vector unsigned int a) +{ + return vec_vctzw (a); +} + +vector long long +count_trailing_zeros_v2di_1s (vector long long a) +{ + return vec_vctz (a); +} + +vector long long +count_trailing_zeros_v2di_2s (vector long long a) +{ + return vec_vctzd (a); +} + +vector unsigned long long +count_trailing_zeros_v2di_1u (vector unsigned long long a) +{ + return vec_vctz (a); +} + +vector unsigned long long +count_trailing_zeros_v2di_2u (vector unsigned long long a) +{ + return vec_vctzd (a); +} + +/* { dg-final { scan-assembler "vctzb" } } */ +/* { dg-final { scan-assembler "vctzd" } } */ +/* { dg-final { scan-assembler "vctzh" } } */ +/* { dg-final { scan-assembler "vctzw" } } */ +/* { dg-final { scan-assembler-not "cnttzd" } } */ +/* { dg-final { scan-assembler-not "cnttzw" } } */
Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/altivec.md (.../gcc/config/rs6000) (working copy) @@ -207,6 +207,9 @@ (define_mode_attr VP_small [(V2DI "V4SI" (define_mode_attr VP_small_lc [(V2DI "v4si") (V4SI "v8hi") (V8HI "v16qi")]) (define_mode_attr VU_char [(V2DI "w") (V4SI "h") (V8HI "b")]) +;; Vector negate +(define_mode_iterator VNEG [V4SI V2DI]) + ;; Vector move instructions. (define_insn "*altivec_mov<mode>" [(set (match_operand:VM2 0 "nonimmediate_operand" "=Z,v,v,*Y,*r,*r,v,v,*r") @@ -2754,20 +2757,28 @@ (define_expand "reduc_plus_scal_<mode>" DONE; }) +(define_insn "*p9_neg<mode>2" + [(set (match_operand:VNEG 0 "altivec_register_operand" "=v") + (neg:VNEG (match_operand:VNEG 1 "altivec_register_operand" "v")))] + "TARGET_P9_VECTOR" + "vneg<VI_char> %0,%1" + [(set_attr "type" "vecsimple")]) + (define_expand "neg<mode>2" - [(use (match_operand:VI 0 "register_operand" "")) - (use (match_operand:VI 1 "register_operand" ""))] - "TARGET_ALTIVEC" - " + [(set (match_operand:VI2 0 "register_operand" "") + (neg:VI2 (match_operand:VI2 1 "register_operand" "")))] + "<VI_unit>" { - rtx vzero; + if (!TARGET_P9_VECTOR || (<MODE>mode != V4SImode && <MODE>mode != V2DImode)) + { + rtx vzero; - vzero = gen_reg_rtx (GET_MODE (operands[0])); - emit_insn (gen_altivec_vspltis<VI_char> (vzero, const0_rtx)); - emit_insn (gen_sub<mode>3 (operands[0], vzero, operands[1])); - - DONE; -}") + vzero = gen_reg_rtx (GET_MODE (operands[0])); + emit_move_insn (vzero, CONST0_RTX (<MODE>mode)); + emit_insn (gen_sub<mode>3 (operands[0], vzero, operands[1])); + DONE; + } +}) (define_expand "udot_prod<mode>" [(set (match_operand:V4SI 0 "register_operand" "=v") Index: gcc/testsuite/gcc.target/powerpc/p9-vneg.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-vneg.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-vneg.c (.../gcc/testsuite/gcc.target/powerpc) (revision 236665) @@ -0,0 +1,12 @@ +/* { dg-do compile { target { powerpc64*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +/* Verify P9 vector negate instructions. */ + +vector long long v2di_neg (vector long long a) { return -a; } +vector int v4si_neg (vector int a) { return -a; } + +/* { dg-final { scan-assembler "vnegd" } } */ +/* { dg-final { scan-assembler "vnegw" } } */