https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017

--- Comment #4 from Peter Bergner <bergner at gcc dot gnu.org> ---
To be pedantic, "val" is assigned r3, the incoming arg reg.  The compiler
temporary that holds "*val" is assigned r9 which is a volatile register.  The
only non-volatile in use is r31 which was assigned to hold "ret".  Since "ret"
is live and used across both the fast and slow paths, it needs a non-volatile
register since it is live across the slowpath call.  The load of foo occurs
early because foo is a global and could be modified by the slowpath call.

What we want is for "ret" to be assigned a volatile register for the fast path
and only use a non-volatile for the slowpath.  That means we have to split the
live range before RA.  Shrink wrapping is too late here to help us.  We
basically want a function that looks like:

-- test.c --
extern int foo;
extern void slowpath(int *);

int test(int *val)
{
        int ret = foo;

        if (__builtin_expect(*val != 0, 0)) {
                int tmp = ret;
                slowpath(val);
                ret = tmp;
        }

        return ret;
}
--

Here "ret" doesn't live across the call so can be assigned a volatile register
and "tmp" which is live across the call and needs the non-volatile register is
only live in the slowpath, so shrink-wrapping can optimize it.

Doing this by hand doesn't help, because the early optimizers (not sure which
ones, didn't look) optimize those copies away and we end up with the same code,
so we'd want something that maybe adds those copies just before RA?

That said, rewriting the code like the following does seem to give good code on
the fast path:

-- test.c --
extern int foo;
extern void slowpath(int *);

int test(int *val)
{
        int ret = foo;

        if (__builtin_expect(*val != 0, 0)) {
                volatile int tmp = ret;
                slowpath(val);
                ret = tmp;
        }

        return ret;
}
--

0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        .localentry     test,.-test
        lwz 10,0(3)
        addis 9,2,.LC0@toc@ha
        ld 9,.LC0@toc@l(9)
        lwa 9,0(9)
        cmpwi 0,10,0
        bne 0,.L11
        mr 3,9
        blr
        .p2align 4,,15
.L11:
        mflr 0
        std 0,16(1)
        stdu 1,-48(1)
        stw 9,32(1)
        bl slowpath
        nop
        lwa 9,32(1)
        addi 1,1,48
        ld 0,16(1)
        mr 3,9
        mtlr 0
        blr

Reply via email to