https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145
--- Comment #6 from Alan Modra <amodra at gmail dot com> --- Transformations to indirect calls and hoisting function addresses out of loops is fine. That sort of thing has nothing to do with this problem. The problem is that the PLT really is volatile, and the inline PLT code for powerpc exposes those PLT loads without letting gcc know they are in fact volatile. If gcc decides to cache a PLT load in a register and then use it for multiple calls to the same function you might end up going via the ld.so symbol resolver for every one of those calls rather than only on the first call. That is very definitely not an optimisation.