On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > Currently, the futex global hash table suffers from it's fixed, smallish > (for today's standards) size of 256 entries, as well as its lack of NUMA > awareness. Large systems, using many futexes, can be prone to high amounts > of collisions; where these futexes hash to the same bucket and lead to > extra contention on the same hb->lock. Furthermore, cacheline bouncing is a > reality when we have multiple hb->locks residing on the same cacheline and > different futexes hash to adjacent buckets. > > This patch keeps the current static size of 16 entries for small systems, > or otherwise, 256 * ncpus (or larger as we need to round the number to a > power of 2). Note that this number of CPUs accounts for all CPUs that can > ever be available in the system, taking into consideration things like > hotpluging. While we do impose extra overhead at bootup by making the hash > table larger, this is a one time thing, and does not shadow the benefits > of this patch. > > Also, similar to other core kernel components (pid, dcache, tcp), by using > alloc_large_system_hash() we benefit from its NUMA awareness and thus the > table is distributed among the nodes instead of in a single one. We impose > this function's minimum limit of 256 entries, so that in worst case scenarios > or issues, we still end up using the current amount anyways. > > For a custom microbenchmark that pounds on the uaddr hashing -- making the > wait > path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of > futexes, > we can see the following benefits on a 80-core, 8-socket 1Tb server: > > +---------+----------------------------------+------------------------------------------+----------+ > | threads | baseline (ops/sec) [insns/cycle] | large hash table (ops/sec) > [insns/cycle] | increase | > +---------+----------------------------------+------------------------------------------+----------+ > | 512 | 34429 [0.07] | 255274 [0.48] > | +641.45% | > | 256 | 65452 [0.07] | 443563 [0.41] > | +577.69% | > | 128 | 125111 [0.07] | 742613 [0.33] > | +493.56% | > | 80 | 203642 [0.09] | 1028147 [0.29] > | +404.87% | > | 64 | 262944 [0.09] | 997300 [0.28] > | +279.28% | > | 32 | 642390 [0.24] | 965996 [0.27] > | +50.37 | > +---------+----------------------------------+------------------------------------------+----------+ > > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Darren Hart <dvh...@linux.intel.com> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Mike Galbraith <efa...@gmx.de> > Cc: Jeff Mahoney <je...@suse.com> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Scott Norton <scott.nor...@hp.com> > Cc: Tom Vaden <tom.va...@hp.com> > Cc: Aswin Chandramouleeswaran <as...@hp.com> > Signed-off-by: Waiman Long <waiman.l...@hp.com> > Signed-off-by: Jason Low <jason.l...@hp.com> > Signed-off-by: Davidlohr Bueso <davidl...@hp.com> > --- > kernel/futex.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 0768c68..5fa9eb0 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -63,6 +63,7 @@ > #include <linux/sched/rt.h> > #include <linux/hugetlb.h> > #include <linux/freezer.h> > +#include <linux/bootmem.h> > > #include <asm/futex.h> > > @@ -70,7 +71,11 @@ > > int __read_mostly futex_cmpxchg_enabled; > > -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8) > +#if CONFIG_BASE_SMALL > +static unsigned long futex_hashsize = 16; > +#else > +static unsigned long futex_hashsize; > +#endif > > /* > * Futex flags used to encode options to functions and preserve them across > @@ -151,7 +156,11 @@ struct futex_hash_bucket { > struct plist_head chain; > }; > > -static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS]; > +#if CONFIG_BASE_SMALL > +static struct futex_hash_bucket futex_queues[futex_hashsize]; > +#else > +static struct futex_hash_bucket *futex_queues; > +#endif
Something in me squirms at the #if/#else here, but I'll leave that to the tip maintainers to call out if they are so inclined. Same below. Otherwise, looks reasonable to me. Reviewed-by: Darren Hart <dvh...@linux.intel.com> -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/