On 7/15/25 08:57, Kyrylo Tkachov wrote:
> External email: Use caution opening links or attachments
>
>
> Hi all,
>
> We already have patterns to use the NBSL instruction to implement vector
> NOR and NAND operations for SVE types and modes. It is straightforward to
> have similar patterns for the fixed-width Advanced SIMD modes as well, though
> it requires combine patterns without the predicate operand and an explicit 'Z'
> output modifier. This patch does so.
>
> So now for example we generate for:
>
> uint64x2_t nand_q(uint64x2_t a, uint64x2_t b) { return NAND(a, b); }
> uint64x2_t nor_q(uint64x2_t a, uint64x2_t b) { return NOR(a, b); }
>
> nand_q:
>          nbsl z0.d, z0.d, z1.d, z1.d
>          ret
>
> nor_q:
>          nbsl z0.d, z0.d, z1.d, z0.d
>          ret
>
> instead of the previous:
> nand_q:
>          and v0.16b, v0.16b, v1.16b
>          not v0.16b, v0.16b
>          ret
>
> nor_q:
>          orr v0.16b, v0.16b, v1.16b
>          not v0.16b, v0.16b
>          ret
>
> The tied operand requirements for NBSL mean that we can generate the MOVPRFX
> when the operands fall that way, but I guess having a 2-insn MOVPRFX form is
> not worse than the current 2-insn codegen at least, and the MOVPRFX can be
> fused by many cores.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

Looks good to me.

Remi

> Thanks,
> Kyrill
>
> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
>
> gcc/
>
>          * config/aarch64/aarch64-sve2.md (*aarch64_sve2_unpred_nor<mode>):
>          New define_insn.
>          (*aarch64_sve2_nand_unpred<mode>): Likewise.
>
> gcc/testsuite/
>
>          * gcc.target/aarch64/sve2/nbsl_nor_nand_neon.c: New test.
>

Reply via email to