AMD docs say that SYSRET32 loads %ss selector with a value from a MSR, but *cached descriptor* of %ss is not modified. (Intel CPUs reset the descriptor to a fixed, valid state).
It was observed to cause Wine crashes. Conjectured sequence of events causing it is as follows: 1. Wine process enters kernel via syscall insn. 2. Context switch to any other task. 3. Interrupt or exception happens, CPU loads %ss with 0. (This happens according to both Intel and AMD docs.) %ss cached descriptor is set to "invalid" state. 4. Context switch back to Wine. 5. sysret to 32-bit userspace. %ss selector has correct value but its cached descriptor is still invalid. 6. The very first userspace POP insn after this causes exception 12. Fix this by checking %ss selector value. If it is not __KERNEL_DS, (and it really can only be __KERNEL_DS or zero), then load it with __KERNEL_DS. We also use SYSRET32 for SYSENTER-based syscalls, but that codepath is only used by Intel CPUs, which don't have this quirk. Signed-off-by: Denys Vlasenko <[email protected]> Reported-by: Brian Gerst <[email protected]> CC: Brian Gerst <[email protected]> CC: Linus Torvalds <[email protected]> CC: Steven Rostedt <[email protected]> CC: Ingo Molnar <[email protected]> CC: Borislav Petkov <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Oleg Nesterov <[email protected]> CC: Frederic Weisbecker <[email protected]> CC: Alexei Starovoitov <[email protected]> CC: Will Drewry <[email protected]> CC: Kees Cook <[email protected]> CC: [email protected] CC: [email protected] --- arch/x86/ia32/ia32entry.S | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 0c302d0..9537dcb 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -408,6 +408,18 @@ cstar_dispatch: sysretl_from_sys_call: andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS) RESTORE_RSI_RDI_RDX + /* + * On AMD, SYSRET32 loads %ss selector, but does not modify its + * cached descriptor; and in kernel, %ss can be loaded with 0, + * setting cached descriptor to "invalid". This has no effect on + * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail. + * Fix %ss only if it's wrong: read from %ss takes ~2 cycles, + * write to %ss is ~40 cycles. + */ + movl %ss, %ecx + cmpl $__KERNEL_DS, %ecx + jne reload_ss +ss_is_good: movl RIP(%rsp),%ecx CFI_REGISTER rip,rcx movl EFLAGS(%rsp),%r11d @@ -426,6 +438,10 @@ sysretl_from_sys_call: * does not exist, it merely sets eflags.IF=1). */ USERGS_SYSRET32 +reload_ss: + movl $__KERNEL_DS, %ecx + movl %ecx, %ss + jmp ss_is_good #ifdef CONFIG_AUDITSYSCALL cstar_auditsys: -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

