The aarch64_vector_pcs handling in aarch64_hard_regno_call_part_clobbered checks whether the mode might be bigger than 16 bytes, since on SVE targets the (non-SVE) vector PCS only guarantees that the low 16 bytes are preserved. But for multi-register modes, we should instead test whether each single-register part might be bigger than 16 bytes. (The size is always divided evenly between registers.)
The testcase uses XImode as an example where this helps. Tested on aarch64-linux-gnu (with and without SVE). OK to install? Richard 2019-09-10 Richard Sandiford <richard.sandif...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_hard_regno_call_part_clobbered): For multi-registers modes, test how big each register part is. gcc/testsuite/ * gcc.target/aarch64/torture/simd-abi-8.c: New test. Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-09-10 17:37:37.000000000 +0100 +++ gcc/config/aarch64/aarch64.c 2019-09-10 17:37:37.872154464 +0100 @@ -1890,9 +1890,16 @@ aarch64_remove_extra_call_preserved_regs aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno, machine_mode mode) { - bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn); - return FP_REGNUM_P (regno) - && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8); + if (FP_REGNUM_P (regno)) + { + bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn); + poly_int64 per_register_size = GET_MODE_SIZE (mode); + unsigned int nregs = hard_regno_nregs (regno, mode); + if (nregs > 1) + per_register_size = exact_div (per_register_size, nregs); + return maybe_gt (per_register_size, simd_p ? 16 : 8); + } + return false; } /* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS. */ Index: gcc/testsuite/gcc.target/aarch64/torture/simd-abi-8.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/torture/simd-abi-8.c 2019-09-10 17:37:37.872154464 +0100 @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */ + +#include <arm_neon.h> + +void __attribute__ ((aarch64_vector_pcs)) f (void); + +void +g (int64x2x4_t *ptr) +{ + register int64x2x4_t copy asm ("v8") = *ptr; + int64x2x4_t save; + asm volatile ("" : "=w" (save) : "0" (copy)); + f (); + *ptr = save; +} + +/* { dg-final { scan-assembler-times {\tld1\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1\t} 1 } } */