I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT and non-RT code. So given that my goal is to get the call_rcu thread to GC memory as quickly as possible to diminish the overhead of cache misses, I decided to try removing this delay for !RT: the call_rcu thread then wakes up ASAP when the thread invoking call_rcu wakes it. My updates jump to 76349/s (getting there!) ;).
This improvement can be explained by a lower delay between call_rcu and execution of its callback, which decrease the amount of cache used, and therefore provides better cache locality. Signed-off-by: Mathieu Desnoyers <[email protected]> --- urcu-call-rcu-impl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: userspace-rcu/urcu-call-rcu-impl.h =================================================================== --- userspace-rcu.orig/urcu-call-rcu-impl.h +++ userspace-rcu/urcu-call-rcu-impl.h @@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg) else { if (&crdp->cbs.head == _CMM_LOAD_SHARED(crdp->cbs.tail)) call_rcu_wait(crdp); - poll(NULL, 0, 10); + else + poll(NULL, 0, 10); } } call_rcu_lock(&crdp->mtx); _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
