On 10/14/2016 05:15 AM, r...@redhat.com wrote:
> From: Rik van Riel <r...@redhat.com>
> 
> By moving all of the new fpu state handling into switch_fpu_finish,
> the code can be simplified some more. This does get rid of the
> prefetch, but given the size of the fpu register state on modern
> CPUs, and the amount of work done by __switch_to in-between both
> functions, the value of a single cache line prefetch seems somewhat
> dubious anyway.
...
> -
> -     if (fpu.preload) {
> -             if (fpregs_state_valid(new_fpu, cpu))
> -                     fpu.preload = 0;
> -             else
> -                     prefetch(&new_fpu->state);
> -             fpregs_activate(new_fpu);
> -     }
> -
> -     return fpu;
>  }

Yeah, that prefetch is highly dubious.  XRSTOR might not even be
_reading_ that cacheline if the state isn't present (xstate->xfeatures
bit is 0).  If we had to pick *a* cacheline to prefetch for XRSTOR, it
would be the XSAVE header, *not* the FPU state.

I actually did some attempts to optimize the PKRU handling by touching
and prefetching the state before calling XRSTOR.  It actually made
things overall _worse_ when I touched it before the XRSTOR.

It would be ideal to have some data on whether this actually _does_
anything, but I can't imagine it being a real delta in either direction.

Acked-by: Dave Hansen <dave.han...@intel.com>

Reply via email to