> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu...@intel.com> wrote: > > For x86 targets, when -fno-plt is used, external functions are called > > via GOT slot, in 64-bit mode: > > > > [bnd] call/jmp *foo@GOTPCREL(%rip) > > > > and in 32-bit mode: > > > > [bnd] call/jmp *foo@GOT[(%reg)] > > > > With -mindirect-branch=, they are converted to, in 64-bit mode: > > > > pushq foo@GOTPCREL(%rip) > > [bnd] jmp __x86_indirect_thunk[_bnd] > > > > and in 32-bit mode: > > > > pushl foo@GOT[(%reg)] > > [bnd] jmp __x86_indirect_thunk[_bnd] > > > > which were incompatible with CFI. In 64-bit mode, since R11 is a scratch > > register, we generate: > > > > movq foo@GOTPCREL(%rip), %r11 > > [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11 > > > > instead. We do it in ix86_output_indirect_branch so that we can use > > the newly proposed R_X86_64_THUNK_GOTPCRELX relocation: > > > > https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg > > > > movq foo@OTPCREL_THUNK(%rip), %r11 > > [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11 > > > > to load GOT slot into R11. If foo is defined locally, linker can can > > convert > > > > movq foo@GOTPCREL_THUNK(%rip), %reg > > call/jmp __x86_indirect_thunk_reg > > > > to > > > > call/jmp foo > > nop 0L(%rax) > > > > In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may > > used to function parameters, there is no scratch register available. For > > -fno-plt -fno-pic -mindirect-branch=, we expand external function call > > to: > > > > movl foo@GOT, %reg > > [bnd] call/jmp *%reg > > > > so that it can be converted to > > > > movl foo@GOT, %reg > > [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg > > > > in ix86_output_indirect_branch. Since this is performed during RTL > > expansion, other instructions may be inserted between movl and call/jmp. > > Linker optimization isn't always possible. > > > > Tested on i686 and x86-64. OK for trunk? > > > > > > H.J. > > --- > > gcc/ > > > > PR target/83970 > > * config/i386/constraints.md (Bs): Allow GOT_memory_operand > > for TARGET_LP64 with indirect branch conversion. > > (Bw): Likewise. > > * config/i386/i386.c (ix86_expand_call): Handle -fno-plt with > > -mindirect-branch=. > > (ix86_nopic_noplt_attribute_p): Likewise. > > (ix86_output_indirect_branch): In 64-bit mode, convert function > > call via GOT with R11 as a scratch register using > > __x86_indirect_thunk_r11. > > (ix86_output_call_insn): In 64-bit mode, set xasm to NULL when > > calling ix86_output_indirect_branch with function call via GOT. > > * config/i386/i386.md (*call_got_thunk): New call pattern for > > TARGET_LP64 with indirect branch conversion. > > (*call_value_got_thunk): Likewise. > > > > gcc/testsuite/ > > > > PR target/83970 > > * gcc.target/i386/indirect-thunk-5.c: Updated. > > * gcc.target/i386/indirect-thunk-6.c: Likewise. > > * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise. > > * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise. > > * gcc.target/i386/indirect-thunk-extern-5.c: Likewise. > > * gcc.target/i386/indirect-thunk-extern-6.c: Likewise. > > * gcc.target/i386/indirect-thunk-inline-5.c: Likewise. > > * gcc.target/i386/indirect-thunk-inline-6.c: Likewise. > > * gcc.target/i386/indirect-thunk-13.c: New test. > > * gcc.target/i386/indirect-thunk-14.c: Likewise. > > * gcc.target/i386/indirect-thunk-bnd-5.c: Likewise. > > * gcc.target/i386/indirect-thunk-bnd-6.c: Likewise. > > * gcc.target/i386/indirect-thunk-extern-11.c: Likewise. > > * gcc.target/i386/indirect-thunk-extern-12.c: Likewise. > > * gcc.target/i386/indirect-thunk-inline-8.c: Likewise. > > * gcc.target/i386/indirect-thunk-inline-9.c: Likewise.
Patch is OK. I am just bit worried how many additional features we will need relatively late in stage4. My understanding is that at the moment there are no direct plans to retpoline userland, but I see that it may change in future. Can you give us bit of review if there are still some missing parts? Thanks, Honza