Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > This patch makes restore_fpu() an inline. When L1/L2 cache are saturated > it makes a measurable difference. > > Results from profiling Volanomark follow. Sample rate was 2000 samples/sec > (HZ = 250, profile multiplier = 8) on a dual-processor Pentium II Xeon. > > > Before: > > 10680 restore_fpu 333.7500 > 8351 device_not_available 203.6829 > 3823 math_state_restore 59.7344 > ----- > 22854 > > > After: > > 12534 math_state_restore 130.5625 > 8354 device_not_available 203.7561 > ----- > 20888 > > > Patch is "obviously correct" and cuts 9% of the overhead. Please apply.
hm. What context switch rate is that thing doing? Is the benchmark actually doing floating point stuff? We do have the `used_math' optimisation in there which attempts to avoid doing the FP save/restore if the app isn't actually using math. But <ancient recollections> there's code in glibc startup which always does a bit of float, so that optimisation is always defeated. There was some discussion about periodically setting tasks back into !used_math state to try to restore the optimisation for tasks which only do a little bit of FP, but nothing actually got done. > Next step should be to physically place math_state_restore() after > device_not_available(). Would such a patch be accepted? (Yes it > would be ugly and require linker script changes.) Depends on the benefit/ugly ratio ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/