* Dave Hansen <dave.han...@linux.intel.com> wrote:

> On 04/30/2016 12:53 AM, Ingo Molnar wrote:
> > We can still use the compacted area handling instructions, because 
> > presumably 
> > those are the fastest and are also the most optimized ones? But I wouldn't 
> > use 
> > them to do dynamic allocation: just allocate the maximum possible FPU save 
> > area at 
> > task creation time and never again worry about that detail.
> > 
> > Ok?
> 
> Sounds sane to me.
> 
> BTW, I hacked up your "fpu performance" to compare XSAVE vs. XSAVES:
> 
> > [    0.048347] x86/fpu: Cost of: XSAVE                       insn          
> > :   127 cycles
> > [    0.049134] x86/fpu: Cost of: XSAVES                      insn          
> > :   113 cycles
> > [    0.048492] x86/fpu: Cost of: XRSTOR                      insn          
> > :   120 cycles
> > [    0.049267] x86/fpu: Cost of: XRSTORS                     insn          
> > :   102 cycles
> 
> So I guess we can add that to the list of things that XSAVES is good for.

Absolutely!

> [...]  Granted, the real-world benefit is probably hard to measure because 
> the 
> cache residency of the XSAVE buffer isn't as good when _actually_ context 
> switching, but this at least shows a small theoretical advantage for XSAVES.

Yeah, and anything that was measured for real is far from being theoretical. 
It's 
simply a best-case microbenchmark figure, but it's still a nice 10+ cycles 
improvement overall - which might become bigger in future CPU generations.

Thanks,

        Ingo

Reply via email to