On Mon, Aug 4, 2014 at 11:28 PM, Denys Vlasenko <vda.li...@googlemail.com> wrote: > On Fri, Aug 1, 2014 at 7:04 PM, Andy Lutomirski <l...@amacapital.net> wrote: >> On Fri, Aug 1, 2014 at 7:48 AM, Denys Vlasenko <dvlas...@redhat.com> wrote: >>> 64-bit code was using six stack slots fewer by not saving/restoring >>> registers which a callee-preserved according to C ABI, >>> and not allocating space for them >> >> This is great. >> >> Next up: remove FIXUP/RESTORE_TOP_OF_STACK? :) Maybe I'll give that a shot. > > I'm yet at the stage "what that stuff does anyway?" and at > "why do we need percpu old_rsp thingy?" in particular.
On x86_64, the syscall instruction has no effect on rsp. That means that the entry point starts out with no stack. There are no free registers whatsoever at the entry point. That means that the entry code needs to do swapgs, stash rsp somewhere relative to gs, and then load the kernel's rsp. old_rsp is the spot used for this. Now the kernel does an optimization that is, I think, very much not worth it. The kernel doesn't bother sticking the old rsp value into pt_regs (saving two instructions on fast path entries) and doesn't initialize the SS, CS, RCX, and EFLAGS fields in pt_regs, saving four more instructions. To make this optimization work, the whole FIXUP/RESTORE_TOP_OF_STACK dance is needed, and there's the usersp crap in the context switch code, and current_user_stack_pointer(), and probably even more crap that I haven't noticed. And I sure hope that nothing in the *compat* syscall path touches current_user_stack_pointer(), because the compat code doesn't seem to use old_rsp. I think this should all be ripped out. The only real difficulty will be that the sysret code needs to restore rsp itself, so the sysret path will end up needing two more instructions. Removing all of the TOP_OF_STACK stuff will add ten instructions to fast path syscalls, and I wouldn't be surprised if this adds considerably fewer than ten cycles on any modern chip. (It's too bad that there's no unlocked xchg; this could be faster if we had one. It's also too bad that the syscall ABI didn't choose some register to unconditionally set to zero, which would have given us the single scratch register we'd need to avoid this whole mess in the first place.) --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/