Hi all, We already have patterns to use the NBSL instruction to implement vector NOR and NAND operations for SVE types and modes. It is straightforward to have similar patterns for the fixed-width Advanced SIMD modes as well, though it requires combine patterns without the predicate operand and an explicit 'Z' output modifier. This patch does so.
So now for example we generate for: uint64x2_t nand_q(uint64x2_t a, uint64x2_t b) { return NAND(a, b); } uint64x2_t nor_q(uint64x2_t a, uint64x2_t b) { return NOR(a, b); } nand_q: nbsl z0.d, z0.d, z1.d, z1.d ret nor_q: nbsl z0.d, z0.d, z1.d, z0.d ret instead of the previous: nand_q: and v0.16b, v0.16b, v1.16b not v0.16b, v0.16b ret nor_q: orr v0.16b, v0.16b, v1.16b not v0.16b, v0.16b ret The tied operand requirements for NBSL mean that we can generate the MOVPRFX when the operands fall that way, but I guess having a 2-insn MOVPRFX form is not worse than the current 2-insn codegen at least, and the MOVPRFX can be fused by many cores. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_unpred_nor<mode>): New define_insn. (*aarch64_sve2_nand_unpred<mode>): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve2/nbsl_nor_nand_neon.c: New test.
0001-aarch64-Use-SVE2-NBSL-for-vector-NOR-and-NAND-for-Ad.patch
Description: 0001-aarch64-Use-SVE2-NBSL-for-vector-NOR-and-NAND-for-Ad.patch