On Fri, Apr 24, 2015 at 10:21 PM, Andy Lutomirski <l...@amacapital.net> wrote: > On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski <l...@kernel.org> wrote: >> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET >> with SS == 0 results in an invalid usermode state in which SS is >> apparently equal to __USER_DS but causes #SS if used. >> >> Work around the issue by replacing NULL SS values with __KERNEL_DS >> in __switch_to, thus ensuring that SYSRET never happens with SS set >> to NULL. >> >> This was exposed by a recent vDSO cleanup. >> >> Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss >> Signed-off-by: Andy Lutomirski <l...@kernel.org> >> --- >> >> Tested only on Intel, which isn't very interesting. I'll tidy up >> and send a test case, too, once Borislav confirms that it works. >> >> Please don't actually apply this until we're sure we understand the >> scope of the issue. If this doesn't affect SYSRETQ, then we might >> to fix it on before SYSRETL to avoid impacting 64-bit processes >> at all. >> > > After sleeping on it, I think I want to offer a different, more > complicated approach. AFAIK there are really only two ways that this > issue can be visible: > > 1. SYSRETL. We can fix that up in the AMD SYSRETL path. I think > there's a decent argument that that path is less performance-critical > than context switches. > > 2. SYSRETQ. The only way that I know of to see the problem is SYSRETQ > followed by a far jump or return. This is presumably *extremely* > rare. > > What if we fixed #2 up in do_stack_segment. We should double-check > the docs, but I think that this will only ever manifest as #SS(0) with > regs->ss == __USER_DS and !user_mode_64bit(regs). We need to avoid > infinite retry looks, but this might be okay. I think that #SS(0) > from userspace under those conditions can *only* happen as a result of > this issue. Even if not, we could come up with a way to only retry > once per syscall (e.g. set some ti->status flag in the 64-bit syscall > path on AMD and clear it in do_stack_segment). > > This might be way more trouble than it's worth.
Exactly my feeling. What are you trying to save? About four CPU cycles of checking %ss != __KERNEL_DS on each switch_to? That's not worth bothering about. Your last patch seems to be perfect. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/