https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87414
Bug ID: 87414 Summary: -mindirect-branch=thunk produces thunk with incorrect CFI on x86_64 Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fw at gcc dot gnu.org CC: hjl.tools at gmail dot com Target Milestone: --- Target: x86_64 GCC 9.0.0 (20180924) generates these thunks on x86-64: __x86_indirect_thunk_rdi: .LFB1: .cfi_startproc call .LIND1 .LIND0: pause lfence jmp .LIND0 .LIND1: mov %rdi, (%rsp) ret .cfi_endproc .LFE1: I don't think the CFI is correct. At the ret instruction, the CFI indicates that the return address is at the top of the stack. The unwinder will use this return address and subtract one because it's a non-signal handler frame. But the resulting address is located before the start of the function, so it will locate an incorrect FDE based on it. Indeed I see this when si-stepping through the execution with GDB: (gdb) disas Dump of assembler code for function __x86_indirect_thunk_rdi: 0x00000000004004a5 <+0>: callq 0x4004b1 <__x86_indirect_thunk_rdi+12> 0x00000000004004aa <+5>: pause 0x00000000004004ac <+7>: lfence 0x00000000004004af <+10>: jmp 0x4004aa <__x86_indirect_thunk_rdi+5> 0x00000000004004b1 <+12>: mov %rdi,(%rsp) => 0x00000000004004b5 <+16>: retq 0x00000000004004b6 <+17>: nopw %cs:0x0(%rax,%rax,1) (gdb) bt #0 0x00000000004004b5 in __x86_indirect_thunk_rdi () #1 0x0000000000400490 in frame_dummy () at /tmp/cfi.c:16 #2 0x000000000040038e in main () at /tmp/cfi.c:16 End of assembler dump. (gdb) print f2 $1 = {int (void)} 0x400490 <f2> Note the “frame_dummy” instead of “f2” in the backtrace. Test program: __attribute__ ((weak)) int f1 (int (*f2) (void)) { return f2 (); } int f2 (void) { } int main (void) { f1 (f2); } We had a bit of an internal debate whether it's actually possible to produce correct CFI for this. I think we can reflect the stack pointer adjustment after the thunk-internal call in the CFI, so that the unwinder continues to see the original caller of the thunk. Due to the address decrement, this needs to happen for the jmp instruction, not after the .LIND1 label. As an alternative, it would be possible to error out when -mindirect-branch=thunk is used with -fasynchronous-unwind-tables, but since the latter is the default, this would be a bit harsh.