> On Mon, May 04, 2015 at 11:42:20AM -0600, Jeff Law wrote: > > On 05/04/2015 11:39 AM, Jakub Jelinek wrote: > > >On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: > > >>On 05/04/2015 10:37 AM, Alexander Monakov wrote: > > >>>This patch introduces option -fno-plt that allows to expand calls that > > >>>would > > >>>go via PLT to load the address of the function immediately at call site > > >>>(which > > >>>introduces a GOT load). Cover letter explains the motivation for this > > >>>patch. > > >>> > > >>>New option documentation for invoke.texi is missing from the patch; if > > >>>this is > > >>>accepted I'll be happy to send a v2 with documentation added. > > >>> > > >>> * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > > >>> indirect call by forcing address into a pseudo with -fno-plt. > > >>> * common.opt (flag_plt): New option. > > >>OK once you cobble together the invoke.texi changes. > > > > > >Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to > > >inline the plt slot's first part, then lazy binding will work fine. > > I must have missed Alan/Michael's message. > > > > ISTM the win here is that by going through the GOT, you can CSE the > > GOT reference and possibly get some more register allocation > > freedom. Is that still the case with Alan/Michael's approach? > > There are many advantages to 'going through the GOT'. CSE'ing the > reference is just one. The biggest (IMO) is that you can avoid the bad > PLT ABI that most targets have, where making a call to a PLT slot > requires the GOT address to be pre-loaded into a fixed, call-saved > register. This precludes sibcalls and forces many functions which > otherwise would not need their own stack frames to create one for > saving the old value of the GOT register. See my blog entry on the > topic here: http://ewontfix.com/18/
One common pattern I noticed while looking at codegen for speculative devirtualization is that in case we do not inline the virtual call we end up with if (ptr = &foo) foo() which leads to both GOT lookup to figure out address of foo and call across PLT. It would be nice to handle this gratefully. Note that one of improvements I want to do to devirt machinery is to change the code seuqence to: if (vptr == &expected_vtable) foo () else vptr[token](); To saven the vtable lookup. But this is not possible in all cases - it happens that there are multiple predicted vtables all agreeeing on the partiuclar slot. Honza > > Anyone who really wants lazy binding can use -fplt (which is > presumably still the default; I didn't check) but lazy binding should > largely be considered deprecated anyway since effective use of relro > protection requires -z now too, in which case you're paying all the > costs (which are considerable!) for lazy binding support even though > you won't get it. > > Rich