* Dave Hansen <dave.han...@linux.intel.com> wrote: > On 04/30/2016 12:53 AM, Ingo Molnar wrote: > > We can still use the compacted area handling instructions, because > > presumably > > those are the fastest and are also the most optimized ones? But I wouldn't > > use > > them to do dynamic allocation: just allocate the maximum possible FPU save > > area at > > task creation time and never again worry about that detail. > > > > Ok? > > Sounds sane to me. > > BTW, I hacked up your "fpu performance" to compare XSAVE vs. XSAVES: > > > [ 0.048347] x86/fpu: Cost of: XSAVE insn > > : 127 cycles > > [ 0.049134] x86/fpu: Cost of: XSAVES insn > > : 113 cycles > > [ 0.048492] x86/fpu: Cost of: XRSTOR insn > > : 120 cycles > > [ 0.049267] x86/fpu: Cost of: XRSTORS insn > > : 102 cycles > > So I guess we can add that to the list of things that XSAVES is good for.
Absolutely! > [...] Granted, the real-world benefit is probably hard to measure because > the > cache residency of the XSAVE buffer isn't as good when _actually_ context > switching, but this at least shows a small theoretical advantage for XSAVES. Yeah, and anything that was measured for real is far from being theoretical. It's simply a best-case microbenchmark figure, but it's still a nice 10+ cycles improvement overall - which might become bigger in future CPU generations. Thanks, Ingo