Hi All, Some Neoverse Software Optimization Guides (SWoG) have a clause that state that for predicated operations that also produce a predicate it is preferred that the codegen should use a different register for the destination than that of the input predicate in order to avoid a performance overhead.
This of course has the problem that it increases register pressure and so should be done with care. Additionally not all micro-architectures have this consideration and so it shouldn't be done as a default thing. The patch series adds support for doing conditional early clobbers through a combination of new alternatives and attributes to control their availability. On high register pressure we also use LRA's costing to prefer not to use the alternative and instead just use the tie as this is preferable to a reload. Concretely this patch series does: > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 foo: mov z31.h, w0 ptrue p3.b, all cmplo p0.h, p3/z, z0.h, z31.h b use > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve foo: mov z31.h, w0 ptrue p0.b, all cmplo p0.h, p0/z, z0.h, z31.h b use > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 > -ffixed-p[1-15] foo: mov z31.h, w0 ptrue p0.b, all cmplo p0.h, p0/z, z0.h, z31.h b use Testcases for the changes are in the last patch of the series. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Thanks, Tamar --- --