* Phil Howard ([email protected]) wrote: > On Mon, Jun 6, 2011 at 12:41 PM, Paul E. McKenney > <[email protected]> wrote: > > On Mon, Jun 06, 2011 at 03:21:07PM -0400, Mathieu Desnoyers wrote: > >> * Mathieu Desnoyers ([email protected]) wrote: > >> > I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT > >> > and non-RT code. So given that my goal is to get the call_rcu thread to > >> > GC memory as quickly as possible to diminish the overhead of cache > >> > misses, I decided to try removing this delay for !RT: the call_rcu > >> > thread then wakes up ASAP when the thread invoking call_rcu wakes it. My > >> > updates jump to 76349/s (getting there!) ;). > >> > > >> > This improvement can be explained by a lower delay between call_rcu and > >> > execution of its callback, which decrease the amount of cache used, and > >> > therefore provides better cache locality. > >> > >> I just wonder if it's worth it: removing this delay from the !RT > >> call_rcu thread can cause high-rate of synchronize_rcu() calls. So > >> although there might be an advantage in terms of update rate, it will > >> likely cause extra cache-line bounces between the call_rcu threads and > >> the reader threads. > >> > >> test_urcu_rbtree 7 1 20 -g 1000000 > >> > >> With the delay in the call_rcu thread: > >> search: 1842857 items/reader thread/s (7 reader threads) > >> updates: 21066 items/s (1 update thread) > >> ratio: 87 search/update > >> > >> Without the delay in the call_rcu thread: > >> search: 3064285 items/reader thread/s (7 reader threads) > >> updates: 45096 items/s (1 update thread) > >> ratio: 68 search/update > >> > >> So basically, adding the delay doubles the update performance, at the > >> cost of being 33% slower for reads. My first thought is that if an > >> application has very frequent updates, then maybe it wants to have fast > >> updates because the update throughput is then important. If the > >> application has infrequent updates, then the reads will be fast anyway, > >> because rare call_rcu invocation will trigger less cache-line bounce > >> between readers and writers. Any other thoughts on this trade-off and > >> how to deal with it ? > > > > One approach would be to let the user handle it using real-time > > priority adjustment. Another approach would be to let the user > > specify the wait time in milliseconds, and skip the poll() system > > call if the specified wait time is zero. > > > > The latter seems more sane to me. It also allows the user to > > specify (say) 10000 milliseconds for cases where there is a > > lot of memory and where amortizing synchronize_rcu() overhead > > across a large number of updates is important. > > > > Other thoughts? > > > > Thanx, Paul > > If synchronize_rcu is used to time memory reclamation, then trading > memory for overhead is a valid way to think of this timing. But if > synchronize_rcu is required inside an update for other purposes (e.g. > my RBTree algorithm or Josh's hash table resize), then the trade-off > needs to include synchronize_rcu overhead vs. update throughput.
I've got some thoughts about use of synchronize_rcu() within the algorithms, which can be summed up by "let's try not to do that". ;) We'll see how far I can get in terms of update performance without relying on synchronize_rcu() within my rbtree updates. As far as hash tables are concerned, I might find time to tackle this problem later on with a similar mindset. Thanks ! Mathieu > > -phil > > > > >> Thanks, > >> > >> Mathieu > >> > >> > >> > > >> > Signed-off-by: Mathieu Desnoyers <[email protected]> > >> > --- > >> > urcu-call-rcu-impl.h | 3 ++- > >> > 1 file changed, 2 insertions(+), 1 deletion(-) > >> > > >> > Index: userspace-rcu/urcu-call-rcu-impl.h > >> > =================================================================== > >> > --- userspace-rcu.orig/urcu-call-rcu-impl.h > >> > +++ userspace-rcu/urcu-call-rcu-impl.h > >> > @@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg) > >> > else { > >> > if (&crdp->cbs.head == > >> > _CMM_LOAD_SHARED(crdp->cbs.tail)) > >> > call_rcu_wait(crdp); > >> > - poll(NULL, 0, 10); > >> > + else > >> > + poll(NULL, 0, 10); > >> > } > >> > } > >> > call_rcu_lock(&crdp->mtx); > >> > > >> > >> -- > >> Mathieu Desnoyers > >> Operating System Efficiency R&D Consultant > >> EfficiOS Inc. > >> http://www.efficios.com > > > > _______________________________________________ > > rp mailing list > > [email protected] > > http://svcs.cs.pdx.edu/mailman/listinfo/rp > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
