If I optimize now for the case that we do not share the cpu cache between different cpus then performance way drop for the case in which we share the cache (hyperthreading).
If we do not share the cache then processors essentially needs to have their own lists of partial caches in which they keep cache hot objects. (something mini NUMA like). Any writes to shared objects will cause cacheline eviction on the other which is not good. If they do share the cpu cache then they need to have a shared list of partial slabs. Not sure where to go here. Increasing the per cpu slab size may hold off the issue up to a certain cpu cache size. For that we would need to identify which slabs create the performance issue. One easy way to check that this is indeed the case: Enable fake NUMA. You will then have separate queues for each processor since they are on different "nodes". Create two fake nodes. Run one thread in each node and see if this fixes it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/