On Tue, May 19, 2015 at 05:10:11PM -0700, H.J. Lu wrote: > On Tue, May 19, 2015 at 1:54 PM, Rich Felker <dal...@libc.org> wrote: > > On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote: > >> On Tue, May 19, 2015 at 1:15 PM, Rich Felker <dal...@libc.org> wrote: > >> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote: > >> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson <r...@redhat.com> > >> >> wrote: > >> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote: > >> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson > >> >> >> <r...@redhat.com> wrote: > >> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote: > >> >> >>>> I'm still mildly worried that concerns for supporting > >> >> >>>> relaxation might lead to decisions not to optimize code in ways > >> >> >>>> that > >> >> >>>> would be difficult to relax (e.g. certain types of address load > >> >> >>>> reordering or hoisting) but I don't understand GCC internals > >> >> >>>> sufficiently to know if this concern is warranted or not. > >> >> >>> > >> >> >>> It is. The relaxation that HJ is working on requires that the > >> >> >>> reads from the > >> >> >>> got not be hoisted. I'm not especially convinced that what he's > >> >> >>> working on is > >> >> >>> a win. > >> >> >>> > >> >> >>> With LTO, the compiler can do the same job that he's attempting in > >> >> >>> the linker, > >> >> >>> without an extra nop. Without LTO, leaving it to the linker means > >> >> >>> that you > >> >> >>> can't hoist the load and hide the memory latency. > >> >> >>> > >> >> >> > >> >> >> My relax approach won't take away any optimization done by compiler. > >> >> >> It simply turns indirect branch into direct branch with a nop prefix > >> >> >> at > >> >> >> link-time. I am having a hard time to understand why we shouldn't > >> >> >> do it. > >> >> > > >> >> > I well understand what you're doing. > >> >> > > >> >> > But my point is that the only time the compiler should present you > >> >> > with the > >> >> > form of indirect branch you're looking for is when there's no place > >> >> > to hoist > >> >> > the load. > >> >> > > >> >> > At which point, is it really worth adding a new relocation to the > >> >> > ABI? Is it > >> >> > really worth adding new code to the linker that won't be exercised > >> >> > often? > >> >> > >> >> I believe there are plenty of indirect branches via GOT when compiling > >> >> PIE/PIC with -fno-plt: > >> >> > >> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c > >> >> extern void foo (void); > >> >> > >> >> void > >> >> bar (void) > >> >> { > >> >> foo (); > >> >> } > >> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt > >> >> [hjl@gnu-6 gcc]$ cat x.s > >> >> ..file "x.c" > >> >> ..section .text.unlikely,"ax",@progbits > >> >> ..LCOLDB0: > >> >> ..text > >> >> ..LHOTB0: > >> >> ..p2align 4,,15 > >> >> ..globl bar > >> >> ..type bar, @function > >> >> bar: > >> >> ..LFB0: > >> >> ..cfi_startproc > >> >> jmp *foo@GOTPCREL(%rip) > >> >> ..cfi_endproc > >> >> ..LFE0: > >> >> ..size bar, .-bar > >> > > >> > I agree these exist. What I question is whether the savings from the > >> > linker being able to relax this to a direct call in the case where the > >> > programmer failed to let the compiler make it a direct call to begin > >> > with (by using hidden or protected visibility) are worth the cost of > >> > not being able to hoist the load out of loops or schedule it earlier > >> > in cases where relaxation is not possible because the call target is > >> > not defined in the same DSO. > >> > >> Just for fun. I compiled binutils as PIE with -fno-plt -flto: > >> > >> [hjl@gnu-mic-2 gas]$ file as-new > >> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), > >> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not > >> stripped > >> [hjl@gnu-mic-2 gas]$ > >> > >> There are 43: > >> > >> ff 25 21 93 2d 00 jmpq *0x2d9321(%rip) # 3d5f58 > >> <_DYNAMIC+0x1e8> > >> > >> and 1983 > >> > >> ff 15 eb f4 38 00 callq *0x38f4eb(%rip) # 3d60e0 > >> <_DYNAMIC+0x370> > > > > How many of those would be relaxed? I suspect it depends a lot on > > whether libbfd is static or shared. > > When shared libraries are enabled, there are 177 indirect branches > to locally defined functions. Call to any locally defined functions, > which aren't compiled with LTO, is indirect.
And are the above indirect calls/jumps (1983+43) candidates for scheduling/hoisting the address load (that's not being done yet), or are they the ones the compiler opted not to schedule/hoist? The win from relaxation seems small here, but as long as you're not going to block optimizations that would preclude relaxing, I don't see any disadvantages to doing it. Rich