On Fri, Feb 14, 2025 at 11:06:50PM +0000, Andrew Cooper wrote: > On 13/02/2025 11:24 pm, Jennifer Miller wrote: > > On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: > >>>> ; swap stacks as normal > >>>> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 > >>>> <cpu_tss_rw+20> > >>>> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 > >>>> <pcpu_hot+24> > >> ... these are memory accesses using the user %gs. As you note a few > >> lines lower, %gs isn't safe at this point. > >> > >> A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping, > >> at point we'll have loaded an attacker controlled %rsp, then take #PF > >> trying to spill %rsp into pcpu_hot, and now we're running the pagefault > >> handler on an attacker controlled stack and gsbase. > >> > > I don't follow, the spill of %rsp into pcpu_hot occurs first, before we > > would move to the attacker controlled stack. This is Intel asm syntax, > > sorry if that was unclear. > > No, sorry. It's clearly written; I simply wasn't paying enough attention. > > > Still, I hadn't considered misusing readonly/unmapped pages on the GPR > > register spill that follows. Could we enforce that the stack pointer we get > > be page aligned to prevent this vector? So that if one were to attempt to > > point the stack to readonly or unmapped memory they should be guaranteed to > > double fault? > > Hmm. > > Espfix64 does involve #DF recovering from a write to a read-only stack. > (This broken corner of x86 is also fixed in FRED. We fixed a *lot* of > thing.)
Interesting, I haven't gotten around to reading into how FRED works, it sounds neat. > > As long the #DF handler can be updated to safely distinguish espfix64 > from this entrypoint attack, this seems like it might mitigate the > read-only case. > > I think we can do the overwrite at any point before actually calling into > > the individual syscall handlers, really anywhere before potentially > > hijacked indirect control flow can occur and then restore it just after > > those return e.g., for the 64-bit path I am currently overwriting it at the > > start of do_syscall_64 and then restoring it just before > > syscall_exit_to_user_mode. I'm not sure if there is any reason to do it > > sooner while we'd still be register constrained. > > I don't follow. If any "bad" execution is found in an entrypoint, Linux > needs to panic(). Detecting the malice involves clobbering an in-use > stack, and there's no ability to safely recover. Sorry, this was in response to Jann's question about the mitigation strategy proposed in my initial email. > > ~Andrew ~Jennifer
