* Denys Vlasenko <dvlas...@redhat.com> wrote:

> On 06/15/2015 10:20 PM, Ingo Molnar wrote:
> >> Actually, ecx and r11 need to be loaded first. They are not so much 
> >> "restored" 
> >> as "prepared for SYSRET insn". Every cycle lost in loading these delays 
> >> SYSRET. 
> >> [...]
> > 
> > So in the typical case they will still be cached, and so their max latency 
> > should 
> > be around 3 cycles.
> 
> If syscall flushes caches (say, a large read), or sleeps
> and CPU schedules away, then pt_regs->ip,flags are evicted
> and need to be reloaded.
> 
> > In fact because they are memory loads, they don't really have dependencies,
> > they should be available to SYSRET almost immediately,
> 
> They depend on the memory data.
> 
> > i.e. within a cycle - and 
> > there's no reason to believe why these loads wouldn't pipeline properly and 
> > parallelize with the many other things SYSRET has to do to organize a 
> > return to 
> > user-space, before it can actually use the target RIP and RFLAGS.
> 
> This does not sound right.
> 
> If it takes, say, 20 cycles to pull data from e.g. L3 cache to ECX,
> then SYSRET can't possibly complete sooner than in 20 cycles.

Yeah, that's true, but my point is: SYSRET has to do a lot of other things 
(permission checks, loading the user mode state - most of which are unrelated 
to 
R11/RCX), which take dozens of cycles, and which are probably overlapped with 
any 
cache misses on arguments such as R11/RCX.

It's not impossible that reordering helps, for example if SYSRET has some 
internal 
dependencies that makes it parallelism worse than ideal - but I'd complicate 
this 
code only if it gives a measurable improvement for cache-cold syscall 
performance.

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to