I tried this:
struct C {
virtual ~C();
virtual void f();
};
void
f (C *p)
{
p->f();
p->f();
}
with r256939 and -mindirect-branch=thunk -O2 on x86-64 GNU/Linux, and
got this:
_Z1fP1C:
.LFB0:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq (%rdi), %rax
movq %rdi, %rbx
jmp .LIND1
.LIND0:
pushq 16(%rax)
jmp __x86_indirect_thunk
.LIND1:
call .LIND0
movq (%rbx), %rax
movq %rbx, %rdi
popq %rbx
.cfi_def_cfa_offset 8
movq 16(%rax), %rax
jmp __x86_indirect_thunk_rax
.cfi_endproc
This doesn't look quite right. x86-64 is supposed to have asynchronous
unwind tables by default, but there is nothing that reflects the change
in the (relative) frame address after .LIND0. I think that region
really has to be moved outside of the .cfi_startproc/.cfi_endproc bracket.
There is a different issue with the think itself.
__x86_indirect_thunk_rax:
.LFB2:
.cfi_startproc
call .LIND5
.LIND4:
pause
lfence
jmp .LIND4
.LIND5:
mov %rax, (%rsp)
ret
.cfi_endproc
If a signal is delivered after the mov has executed, the unwinder will
eventually unwind through the signal frame and hit
__x86_indirect_thunk_rax. It does not treat it as a signal frame, so
the return address of the stack is decremented by one, in an attempt to
obtain a program counter value which is within the call instruction.
However, in this scenario, the return address is actually the start of
the function, and subtracting one moves the program counter out of the
unwind region for that function.
Both issues are visible in GDB if you set breakpoints in the proper
places because the frame information used for debugging is incorrect as
well.
This probably does not concern the kernel that much, but it is
definitely a problem for userspace.
Thanks,
Florian