>> Assuming this is an issue you all feel is worth addressing, I will >> continue working on providing a patch. I'm concerned though that the >> overhead from adding a wrmsr on both syscall entry and exit to >> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so >> any feedback in regards to the approach or suggestions of alternate >> approaches to patching are welcome :) > > Since the kernel, as far as I understand, uses FineIBT without > backwards control flow protection (in other words, I think we assume > that the kernel stack is trusted?),
This is fun indeed. Linux cannot use supervisor shadow stacks because the mess around NMI re-entrancy (and IST more generally) requires ROP gadgets in order to function safely. Implementing this with shadow stacks active, while not impossible, is deemed to be prohibitively complicated. Linux's supervisor shadow stack support is waiting for FRED support, which fixes both the NMI re-entrancy problem, and other exceptions nesting within NMIs, as well as prohibiting the use of the SWAPGS instruction as FRED tries to make sure that the correct GS is always in context. But, FRED support is slated for PantherLake/DiamondRapids which haven't shipped yet, so are no use to the problem right now. > could we build a cheaper > check on that basis somehow? For example, maybe we could do something like: > > ``` > endbr64 > test rsp, rsp > js slowpath > swapgs > ``` I presume it's been pointed out already, but there are 3 related entrypoints here. SYSCALL64 which is discussed, SYSCALL32 and SYSENTER which are related. But, any other IDT entry is in a similar bucket. If we're corrupting a function pointer or return address to redirect here, then the check of CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers stack frame. For IDT entries, checking %rsp is reasonable, because userspace can't forge a kernel-like %rsp. However, SYSCALL64 specifically leaves %rsp entirely attacker controlled (and even potentially non-canonical), so I'm wondering what you hand in mind for the slowpath to truly distinguish kernel context from user context? ~Andrew
