I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT
and non-RT code.  So given that my goal is to get the call_rcu thread to
GC memory as quickly as possible to diminish the overhead of cache
misses, I decided to try removing this delay for !RT: the call_rcu
thread then wakes up ASAP when the thread invoking call_rcu wakes it. My
updates jump to 76349/s (getting there!) ;).

This improvement can be explained by a lower delay between call_rcu and
execution of its callback, which decrease the amount of cache used, and
therefore provides better cache locality.

Signed-off-by: Mathieu Desnoyers <[email protected]>
---
 urcu-call-rcu-impl.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: userspace-rcu/urcu-call-rcu-impl.h
===================================================================
--- userspace-rcu.orig/urcu-call-rcu-impl.h
+++ userspace-rcu/urcu-call-rcu-impl.h
@@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg)
                else {
                        if (&crdp->cbs.head == _CMM_LOAD_SHARED(crdp->cbs.tail))
                                call_rcu_wait(crdp);
-                       poll(NULL, 0, 10);
+                       else
+                               poll(NULL, 0, 10);
                }
        }
        call_rcu_lock(&crdp->mtx);


_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to