On 7/15/25 08:57, Kyrylo Tkachov wrote: > External email: Use caution opening links or attachments > > > Hi all, > > We already have patterns to use the NBSL instruction to implement vector > NOR and NAND operations for SVE types and modes. It is straightforward to > have similar patterns for the fixed-width Advanced SIMD modes as well, though > it requires combine patterns without the predicate operand and an explicit 'Z' > output modifier. This patch does so. > > So now for example we generate for: > > uint64x2_t nand_q(uint64x2_t a, uint64x2_t b) { return NAND(a, b); } > uint64x2_t nor_q(uint64x2_t a, uint64x2_t b) { return NOR(a, b); } > > nand_q: > nbsl z0.d, z0.d, z1.d, z1.d > ret > > nor_q: > nbsl z0.d, z0.d, z1.d, z0.d > ret > > instead of the previous: > nand_q: > and v0.16b, v0.16b, v1.16b > not v0.16b, v0.16b > ret > > nor_q: > orr v0.16b, v0.16b, v1.16b > not v0.16b, v0.16b > ret > > The tied operand requirements for NBSL mean that we can generate the MOVPRFX > when the operands fall that way, but I guess having a 2-insn MOVPRFX form is > not worse than the current 2-insn codegen at least, and the MOVPRFX can be > fused by many cores. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Ok for trunk?
Looks good to me. Remi > Thanks, > Kyrill > > Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > > gcc/ > > * config/aarch64/aarch64-sve2.md (*aarch64_sve2_unpred_nor<mode>): > New define_insn. > (*aarch64_sve2_nand_unpred<mode>): Likewise. > > gcc/testsuite/ > > * gcc.target/aarch64/sve2/nbsl_nor_nand_neon.c: New test. >