* Denys Vlasenko <dvlas...@redhat.com> wrote: > On 06/15/2015 10:20 PM, Ingo Molnar wrote: > >> Actually, ecx and r11 need to be loaded first. They are not so much > >> "restored" > >> as "prepared for SYSRET insn". Every cycle lost in loading these delays > >> SYSRET. > >> [...] > > > > So in the typical case they will still be cached, and so their max latency > > should > > be around 3 cycles. > > If syscall flushes caches (say, a large read), or sleeps > and CPU schedules away, then pt_regs->ip,flags are evicted > and need to be reloaded. > > > In fact because they are memory loads, they don't really have dependencies, > > they should be available to SYSRET almost immediately, > > They depend on the memory data. > > > i.e. within a cycle - and > > there's no reason to believe why these loads wouldn't pipeline properly and > > parallelize with the many other things SYSRET has to do to organize a > > return to > > user-space, before it can actually use the target RIP and RFLAGS. > > This does not sound right. > > If it takes, say, 20 cycles to pull data from e.g. L3 cache to ECX, > then SYSRET can't possibly complete sooner than in 20 cycles.
Yeah, that's true, but my point is: SYSRET has to do a lot of other things (permission checks, loading the user mode state - most of which are unrelated to R11/RCX), which take dozens of cycles, and which are probably overlapped with any cache misses on arguments such as R11/RCX. It's not impossible that reordering helps, for example if SYSRET has some internal dependencies that makes it parallelism worse than ideal - but I'd complicate this code only if it gives a measurable improvement for cache-cold syscall performance. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/