On Tue, 2013-06-11 at 10:53 -0700, Davidlohr Bueso wrote: > I hate to be the bearer of bad news but I got some pretty bad aim7 > performance numbers with this patch on an 8-socket (80 core) 256 Gb > memory DL980 box against a vanilla 3.10-rc4 kernel:
This doesn't surprise me as the spin lock now contains a function call on any contention. Not to mention the added i$ pressure on the embedded spinlock code having to setup a function call. Even if the queues are not used, it adds a slight overhead to all spinlocks, due to the code size increase as well as a function call on all contention, which will also have an impact on i$ and branch prediction. > > * shared workload: > 10-100 users is in the noise area. > 100-2000 users: -13% throughput. > > * high_systime workload: > 10-700 users is in the noise area. > 700-2000 users: -55% throughput. > > * disk: > 10-100 users -57% throughput. > 100-1000 users: -25% throughput > 1000-2000 users: +8% throughput (this patch only benefits when we have a Perhaps this actually started using the queues? > lot of concurrency). > > * custom: > 10-100 users: -33% throughput. > 100-2000 users: -46% throughput. > > * alltests: > 10-1000 users is in the noise area. > 1000-2000 users: -10% throughput. > > One notable exception is the short workload where we actually see > positive numbers: > 10-100 users: +40% throughput. > 100-2000 users: +69% throughput. Perhaps short work loads have a cold cache, and the impact on cache is not as drastic? It would be interesting to see what perf reports on these runs. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/