On Wed, May 6, 2015 at 10:35 AM, Rich Felker <dal...@libc.org> wrote:
> On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote:
>> On Wed, 6 May 2015, Jakub Jelinek wrote:
>> > The linker would know very well what kind of relocations are used for
>> > particular PLT slot, and for the new relocations which would resolve to the
>> > address of the .got.plt slot it could just tweak corresponding 3rd insn
>> > in the slot, to not jump to first plt slot - 16, but a few bytes before 
>> > that
>> > that would just load the address of _G_O_T_ into %ebx and then fallthru
>> > into the 0x4c2b7310 snippet above.  The lazy binding would be a few ticks
>> > slower in that case, but no requirement on %ebx to contain _G_O_T_.
>>
>> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT stub.
>
> Indeed. And the situation is the same on almost all targets. The only
> exceptions are those with direct PC-relative addressing (like x86_64)
> and those with reserved inter-procedural linkage registers and
> efficient PC-relative address loading via them (like ARM and AArch64).
> MIPS (o32) is also an interesting exception in that the normal ABI is
> already PLT-free, and while callees need a PIC register loaded, it's a
> call-clobbered register, not a call-saved one, so it doesn't make the
> same kind of trouble,
>
> I really don't see a need to make no-PLT code gen support lazy binding
> when it's necessarily going to be costly to do so, and precludes most
> of the benefits of the no-PLT approach. Anyone still wanting/needing
> lazy binding semantics can use PLT, and can even choose on a per-TU
> basis (or maybe even more fine-grained with pragmas/attributes?).
> Those of us who are suffering the cost of PLT with no benefits
> (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by
> adding -fno-plt) and enjoy something like a 10% performance boost in
> PIC/PIE.
>

There are things compiler can do for performance and correctness
if it is told what options will be passed to linker.  -z now is one and
-Bsymbolic is another one:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886

I think we should add -fnow and -fsymbolic.  Together with LTO,
we can generate faster executables as well as shared libraries.

-- 
H.J.

Reply via email to