When this_cpu changes in the free path node needs to change too.
Otherwise the slab can end up in the wrong node's list and this
eventually leads to WARN_ONs and of course worse NUMA performace.
This patch is likely not complete (the NUMA slab code is *very* hairy),
but seems to make the make -j128 test survive for at least two hours.
But at least it fixes one case that regularly triggered during
testing, resulting in slabs in the wrong node lists and triggering
WARN_ONs in slab_put/get_obj
I tried a complete audit of keeping this_cpu/node/slabp in sync when needed,
but
it is very hairy code and I likely missed some cases. This so far
fixes only the simple free path; but it seems to be good enough
to not trigger easily anymore on a NUMA system with memory pressure.
Longer term the only good fix is probably to migrate to slub.
Or disable NUMA slab for PREEMPT_RT (its value has been disputed
in some benchmarks anyways)
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Index: linux-2.6.23-rt1/mm/slab.c
===================================================================
--- linux-2.6.23-rt1.orig/mm/slab.c
+++ linux-2.6.23-rt1/mm/slab.c
@@ -1193,7 +1193,7 @@ cache_free_alien(struct kmem_cache *cach
struct array_cache *alien = NULL;
int node;
- node = numa_node_id();
+ node = cpu_to_node(*this_cpu);
/*
* Make sure we are not freeing a object from another node to the array
@@ -4194,6 +4194,8 @@ static void cache_reap(struct work_struc
work_done += reap_alien(searchp, l3, &this_cpu);
+ node = cpu_to_node(this_cpu);
+
work_done += drain_array(searchp, l3,
cpu_cache_get(searchp, this_cpu), 0, node);
-
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html