On Thu, Nov 29, 2018 at 11:08 AM Linus Torvalds <torva...@linux-foundation.org> wrote: > > On Thu, Nov 29, 2018 at 10:58 AM Linus Torvalds > <torva...@linux-foundation.org> wrote: > > > > In contrast, if the call was wrapped in an inline asm, we'd *know* the > > compiler couldn't turn a "call wrapper(%rip)" into anything else. > > Actually, I think I have a better model - if the caller is done with inline > asm. > > What you can do then is basically add a single-byte prefix to the > "call" instruction that does nothing (say, cs override), and then > replace *that* with a 'int3' instruction. > > Boom. Done. > > Now, the "int3" handler can just update the instruction in-place, but > leave the "int3" in place, and then return to the next instruction > byte (which is just the normal branch instruction without the prefix > byte). > > The cross-CPU case continues to work, because the 'int3' remains in > place until after the IPI.
Hmm, cute. But then the calls are in inline asm, which results in giant turds like we have for the pvop vcalls. And, if they start being used more generally, we potentially have ABI issues where the calling convention isn't quite what the asm expects, and we explode. I propose a different solution: As in this patch set, we have a direct and an indirect version. The indirect version remains exactly the same as in this patch set. The direct version just only does the patching when all seems well: the call instruction needs to be 0xe8, and we only do it when the thing doesn't cross a cache line. Does that work? In the rare case where the compiler generates something other than 0xe8 or crosses a cache line, then the thing just remains as a call to the out of line jmp trampoline. Does that seem reasonable? It's a very minor change to the patch set. Alternatively, we could actually emulate call instructions like this: void __noreturn jump_to_kernel_pt_regs(struct pt_regs *regs, ...) { struct pt_regs ptregs_copy = *regs; barrier(); *(unsigned long *)(regs->sp - 8) = whatever; /* may clobber old regs, but so what? */ asm volatile ("jmp return_to_alternate_ptregs"); } where return_to_alternate_ptregs points rsp to the ptregs and goes through the normal return path. It's ugly, but we could have a test case for it, and it should work fine.