Hi Adrian, * Adrian Bunk ([EMAIL PROTECTED]) wrote:
> I have two questions for getting the bigger picture: > > 1. How much code will be changed? > Looking at the F00F bug fixup example, it seems we'll have to make > several functions in every single driver conditional in the kernel for > getting the best performance. > How many functions to you plan to make conditional this way? > I just changed the infrastructure to match Andi's advice : the cond_calls are now "fancy" variables : they refer to a static variable address, and every update (which must be done through the cond call API) changes every load immediate referring to this variable. Therefore, they can be simply embedded in a if(cond_call(var)) statement, so there is no big code change to do. > 2. What is the real-life performance improvement? > That micro benchmarks comparing cache hits with cache misses give great > looking numbers is obvious. > But what will be the performance improvement in real workloads after the > functions you plan to make conditional according to question 1 have been > made conditional? > Hrm, I am trying to get interesting numbers out of lmbench: I just ran a test on a kernel sprinkled with about 50 markers at important sites (LTTng markers: system call entry/exit, traps, interrupt handlers, ...). The markers are compiled-in, but in "disabled state". Since the markers re-use the cond_call infrastructure, each marker has its own cond_call. I ran the test in two situations on my Pentium 4 box: 1 - Cond call optimizations are disabled. This is the equivalent of using a global variable (in the kernel data) as a condition for the branching. 2 - Cond call optimizations are enabled. It uses the load immediate (which is now loading an integer on x86 instead of a char, to make sure there is no pipeline stall due to false register dependency). The results are that we really cannot tell that one is faster/slower than the other; the standard deviation is much higher than the difference between the two situations. Note that lmbench is a workload that will not trigger much L1 cache stress, since it repeats the same tests many times. Do you have any suggestion of a test that would be more representative of a real diversified (in term of in-kernel locality of reference) workload ? Thanks, Mathieu > TIA > Adrian > > -- > > "Is there not promise of rain?" Ling Tan asked suddenly out > of the darkness. There had been need of rain for many days. > "Only a promise," Lao Er said. > Pearl S. Buck - Dragon Seed > -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/