> On Jan 10, 2019, at 4:56 PM, Andy Lutomirski <l...@kernel.org> wrote: > > On Thu, Jan 10, 2019 at 3:02 PM Linus Torvalds > <torva...@linux-foundation.org> wrote: >> On Thu, Jan 10, 2019 at 12:52 PM Josh Poimboeuf <jpoim...@redhat.com> wrote: >>> Right, emulating a call instruction from the #BP handler is ugly, >>> because you have to somehow grow the stack to make room for the return >>> address. Personally I liked the idea of shifting the iret frame by 16 >>> bytes in the #DB entry code, but others hated it. >> >> Yeah, I hated it. >> >> But I'm starting to think it's the simplest solution. >> >> So still not loving it, but all the other models have had huge issues too. > > Putting my maintainer hat on: > > I'm okay-ish with shifting the stack by 16 bytes. If this is done, I > want an assertion in do_int3() or wherever the fixup happens that the > write isn't overlapping pt_regs (which is easy to implement because > that code has the relevant pt_regs pointer). And I want some code > that explicitly triggers the fixup when a CONFIG_DEBUG_ENTRY=y or > similar kernel is built so that this whole mess actually gets > exercised. Because the fixup only happens when a > really-quite-improbable race gets hit, and the issues depend on stack > alignment, which is presumably why Josh was able to submit a buggy > series without noticing. > > BUT: this is going to be utterly gross whenever anyone tries to > implement shadow stacks for the kernel, and we might need to switch to > a longjmp-like approach if that happens.
Here is an alternative idea (although similar to Steven’s and my code). Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow should be done by the calling convention (and gcc plugin should allow us to enforce). Also assume that we hold a table with all source RIP and the matching target. Now, in the int3 handler can you take the faulting RIP and search for it in the “static-calls” table, writing the RIP+5 (offset) into R10 (return address) and the target into R11. You make the int3 handler to divert the code execution by changing pt_regs->rip to point to a new function that does: push R10 jmp __x86_indirect_thunk_r11 And then you are done. No?