On Sat, Feb 21, 2015 at 07:39:52PM +0100, Ingo Molnar wrote: > So the workload improved by ~600,000 usecs, and there's > 68,000 less calls, so it saved 8.8 usecs per call. Isn't
I think you mean more calls. The eager measurement has more calls. Let me do some primitive math: def =(234.681331200 / 712000)*10^6 = 329.60861123595505000000 microsecs/call eager=(234.066525648 / 780000)*10^6 = 300.08528929230769000000 microsecs/call diff is 29.52332194364736000000 microsecs speedup per call which could explain the cost of CR0.TS serialization semantics in the lazy mode. > that a bit too high? Now, is 29 microseconds too high? I'm not sure this is even correct and not some noise interfering. > I'd sleep a lot better if we had some runtime debug flag to > be able to do run-to-run comparisons on the same booted up > kernel, or so. Let me take a look whether we could so some knob... The nice thing is, code uses use_eager_fpu() to check stuff so we should be able to clear the cpufeature flag. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/