[Bug target/94145] Longcalls mis-optimize loading the function address

cvs-commit at gcc dot gnu.org Fri, 27 Mar 2020 15:17:09 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145


--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alan Modra <amo...@gcc.gnu.org>:

https://gcc.gnu.org/g:19e5389debb03c3623f6a2ce8a8f6f4aa2118901

commit r10-7430-g19e5389debb03c3623f6a2ce8a8f6f4aa2118901
Author: Alan Modra <amo...@gmail.com>
Date:   Wed Mar 11 21:22:37 2020 +1030

    [RS6000] PR94145, make PLT loads volatile

    The PLT is volatile.  On PowerPC it is a bss style section which the
    dynamic loader initialises to point at resolver stubs (called glink on
    PowerPC64) to support lazy resolution of function addresses.  The
    first call to a given function goes via the dynamic loader symbol
    resolver, which updates the PLT entry for that function and calls the
    function.  The second call, if there is one and we don't have a
    multi-threaded race, will use the updated PLT entry and thus avoid
    the relatively slow symbol resolver path.

    Calls via the PLT are like calls via a function pointer, except that
    no initialised function pointer is volatile like the PLT.  All
    initialised function pointers are resolved at program startup to point
    at the function or are left as NULL.  There is no support for lazy
    resolution of any user visible function pointer.

    So why does any of this matter to gcc?  Well, normally the PLT call
    mechanism happens entirely behind gcc's back, but since we implemented
    inline PLT calls (effectively putting the PLT code stub that loads the
    PLT entry inline and making that code sequence scheduled), the load of
    the PLT entry is visible to gcc.  That load then is subject to gcc
    optimization, for example in

    /* -S -mcpu=future -mpcrel -mlongcall -O2.  */
    int foo (int);
    void bar (void)
    {
      while (foo(0))
        foo (99);
    }

    we see the PLT load for foo being hoisted out of the loop and stashed
    in a call-saved register.  If that happens to be the first call to
    foo, then the stashed value is that for the resolver stub, and every
    call to foo in the loop will then go via the slow resolver path.  Not
    a good idea.  Also, if foo turns out to be a local function and the
    linker replaces the PLT calls with direct calls to foo then gcc has
    just wasted a call-saved register.

    This patch teaches gcc that the PLT loads are volatile.  The change
    doesn't affect other loads of function pointers and thus has no effect
    on normal indirect function calls.  Note that because the
    "optimization" this patch prevents can only occur over function calls,
    the only place gcc can stash PLT loads is in call-saved registers or
    in other memory.  I'm reasonably confident that this change will be
    neutral or positive for the "ld -z now" case where the PLT is not
    volatile, in code where there is any register pressure.  Even if gcc
    could be taught to recognise cases where the PLT is resolved, you'd
    need to discount use of registers to cache PLT loads by some factor
    involving the chance that those calls would be converted to direct
    calls.

            PR target/94145
            * config/rs6000/rs6000.c (rs6000_longcall_ref): Use unspec_volatile
            for PLT16_LO and PLT_PCREL.
            * config/rs6000/rs6000.md (UNSPEC_PLT16_LO, UNSPEC_PLT_PCREL):
Remove.
            (UNSPECV_PLT16_LO, UNSPECV_PLT_PCREL): Define.
            (pltseq_plt16_lo_, pltseq_plt_pcrel): Use unspec_volatile.

[Bug target/94145] Longcalls mis-optimize loading the function address

Reply via email to