Hi All,

Some Neoverse Software Optimization Guides (SWoG) have a clause that state
that for predicated operations that also produce a predicate it is preferred
that the codegen should use a different register for the destination than that
of the input predicate in order to avoid a performance overhead.

This of course has the problem that it increases register pressure and so should
be done with care.  Additionally not all micro-architectures have this
consideration and so it shouldn't be done as a default thing.

The patch series adds support for doing conditional early clobbers through a
combination of new alternatives and attributes to control their availability.

On high register pressure we also use LRA's costing to prefer not to use the
alternative and instead just use the tie as this is preferable to a reload.

Concretely this patch series does:

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2

foo:
        mov     z31.h, w0
        ptrue   p3.b, all
        cmplo   p0.h, p3/z, z0.h, z31.h
        b       use

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve

foo:
        mov     z31.h, w0
        ptrue   p0.b, all
        cmplo   p0.h, p0/z, z0.h, z31.h
        b       use

> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 
> -ffixed-p[1-15]

foo:
        mov     z31.h, w0
        ptrue   p0.b, all
        cmplo   p0.h, p0/z, z0.h, z31.h
        b       use

Testcases for the changes are in the last patch of the series.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Thanks,
Tamar

---

-- 


Reply via email to