[PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
From: Davidlohr Bueso In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. While the hash bucket's plist head is a cheap way of knowing this, we cannot rely 100% on it as there is a racy window between the futex_wait call and when the task is actually added to the plist. To this end, we couple it with the spinlock check as tasks trying to enter the critical region are most likely potential waiters that will be added to the plist, thus preventing tasks sleeping forever if wakers don't acknowledge all possible waiters. Furthermore, the futex ordering guarantees are preserved, ensuring that waiters either observe the changed user space value before blocking or is woken by a concurrent waker. For wakers, this is done by relying on the barriers in get_futex_key_refs() -- for archs that do not have implicit mb in atomic_inc(), we explicitly add them through a new futex_get_mm function. For waiters we rely on the fact that spin_lock calls already update the head counter, so spinners are visible even if the lock hasn't been acquired yet. For more details please refer to the updated comments in the code and related discussion: https://lkml.org/lkml/2013/11/26/556 Special thanks to tglx for careful review and feedback. Cc: Ingo Molnar Reviewed-by: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Paul E. McKenney Cc: Mike Galbraith Cc: Jeff Mahoney Suggested-by: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Cc: Waiman Long Cc: Jason Low Signed-off-by: Davidlohr Bueso --- kernel/futex.c | 115 + 1 file changed, 91 insertions(+), 24 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index fcc6850..be6399a 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -75,17 +75,20 @@ * The waiter reads the futex value in user space and calls * futex_wait(). This function computes the hash bucket and acquires * the hash bucket lock. After that it reads the futex user space value - * again and verifies that the data has not changed. If it has not - * changed it enqueues itself into the hash bucket, releases the hash - * bucket lock and schedules. + * again and verifies that the data has not changed. If it has not changed + * it enqueues itself into the hash bucket, releases the hash bucket lock + * and schedules. * * The waker side modifies the user space value of the futex and calls - * futex_wake(). This functions computes the hash bucket and acquires - * the hash bucket lock. Then it looks for waiters on that futex in the - * hash bucket and wakes them. + * futex_wake(). This function computes the hash bucket and acquires the + * hash bucket lock. Then it looks for waiters on that futex in the hash + * bucket and wakes them. * - * Note that the spin_lock serializes waiters and wakers, so that the - * following scenario is avoided: + * In futex wake up scenarios where no tasks are blocked on a futex, taking + * the hb spinlock can be avoided and simply return. In order for this + * optimization to work, ordering guarantees must exist so that the waiter + * being added to the list is acknowledged when the list is concurrently being + * checked by the waker, avoiding scenarios like the following: * * CPU 0 CPU 1 * val = *futex; @@ -106,24 +109,52 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. - * The spinlock serializes that: + * + * The correct serialization ensures that a waiter either observes + * the changed user space value before blocking or is woken by a + * concurrent waker: * * CPU 0 CPU 1 * val = *futex; * sys_futex(WAIT, futex, val); * futex_wait(futex, val); - * lock(hash_bucket(futex)); - * uval = *futex; - * *futex = newval; - * sys_futex(WAKE, futex); - * futex_wake(futex); - * lock(hash_bucket(futex)); + * + * waiters++; + * mb(); (A) <-- paired with -. + * | + * lock(hash_bucket(futex)); | + * | + * uval = *futex; | + * |*futex = newval; + * |sys_futex(WAKE, futex); + * | futex_wake(futex); + * | + * `---> mb(); (B) * if (uval == val) - * queue(); + * queue(); * unlock(hash_bucket(futex)); - * schedule(); if (!queue_empty()) - * wake_waiters(futex); - * unlock(
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, 25 Nov 2013, Darren Hart wrote: > On Mon, 2013-11-25 at 20:47 +0100, Thomas Gleixner wrote: > > So now your code melts down to: > > > > write(hb->waiters)|write(uaddr) > > mb|read(hb->waiters) > > read(uaddr) > > > > I fear you simply managed to make the window small enough that your > > testing was not longer able expose it. > > Does seem to be the case. Actually not. It's protected by an atomic_inc() between the write(uaddr) and read(hb->waiters) on the waker side. It took me a while to realize, that get_futex_key_refs() is providing the protection inadvertently. But as I explained we cannot rely on that for all architectures. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, 25 Nov 2013, Darren Hart wrote: > On Mon, 2013-11-25 at 20:47 +0100, Thomas Gleixner wrote: > > > which restores the ordering guarantee, which the hash bucket lock > > > provided so far. > > > > Actually that's not true by design, it just happens to work. > > > > atomic_inc() on x86 is a "lock incl". > > > > The LOCK prefix just guarantees that the cache line which is affected > > by the INCL is locked. And it guarantees that locked operations > > serialize all outstanding load and store operations. > > > > But Documentation/atomic_ops.txt says about atomic_inc(): > > > > "One very important aspect of these two routines is that they DO NOT > > require any explicit memory barriers. They need only perform the > > atomic_t counter update in an SMP safe manner." > > > > So while this has a barrier on x86, it's not guaranteed. > > > But it is guaranteed to be "in an SMP safe manner"... which I guess just > means that two writes will not intermix bytes, but no guarantee that the > value will be visible to other CPUs unless some kind of barrier is > explicitly imposed. > > Correct? Yep, that's what it sayes. > > So now your code melts down to: > > > > write(hb->waiters)|write(uaddr) > > mb|read(hb->waiters) > > read(uaddr) > > > > I fear you simply managed to make the window small enough that your > > testing was not longer able expose it. > > Does seem to be the case. You must be aware, that between the the write(uaddr) and the read(hb->waiters) is the syscall, i.e. a user/kernel space transition. sysenter/syscall have no documented barriers inside, but we don't know whether the actual hw implementation provides one or if the memory ops between modifying uaddr and reaching the read(hb->waiters) point including the real atomic op on the waiter side are good enough to paper over the issue. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, 2013-11-25 at 20:47 +0100, Thomas Gleixner wrote: > On Sat, 23 Nov 2013, Thomas Gleixner wrote: > > On Fri, 22 Nov 2013, Davidlohr Bueso wrote: > > So with the atomic ops you are changing that to: > > > > CPU 0 CPU 1 > > > > val = *futex; > > futex_wait(futex, val); > > > > spin_lock(&hb->lock); > > > > atomic_inc(&hb->waiters); > > uval = *futex; > > *futex = newval; > > > > if (uval != val) {futex_wake(); > >spin_unlock(&hb->lock);if > > (!atomic_read(&hb->waiters)) > >return; return; > > } > > spin_lock((&hb->lock); > > plist_add(hb, self); > > spin_unlock(&hb->lock); > > schedule(); wakeup_waiters(hb); > > ... > > > > which restores the ordering guarantee, which the hash bucket lock > > provided so far. > > Actually that's not true by design, it just happens to work. > > atomic_inc() on x86 is a "lock incl". > > The LOCK prefix just guarantees that the cache line which is affected > by the INCL is locked. And it guarantees that locked operations > serialize all outstanding load and store operations. > > But Documentation/atomic_ops.txt says about atomic_inc(): > > "One very important aspect of these two routines is that they DO NOT > require any explicit memory barriers. They need only perform the > atomic_t counter update in an SMP safe manner." > > So while this has a barrier on x86, it's not guaranteed. But it is guaranteed to be "in an SMP safe manner"... which I guess just means that two writes will not intermix bytes, but no guarantee that the value will be visible to other CPUs unless some kind of barrier is explicitly imposed. Correct? > atomic_read() is a simple read. > > This does not guarantee anything. And if you read > Documentation/atomic_ops.txt it's well documented: > > "*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***" > > > So now your code melts down to: > > write(hb->waiters)|write(uaddr) > mb|read(hb->waiters) > read(uaddr) > > I fear you simply managed to make the window small enough that your > testing was not longer able expose it. Does seem to be the case. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, 25 Nov 2013, Davidlohr Bueso wrote: > On Mon, 2013-11-25 at 18:32 +0100, Thomas Gleixner wrote: > > If the smp_mb() is heavy weight, then it will hurt massivly in the > > case where the hash bucket is not empty, because we add the price for > > the smp_mb() just for no gain. > > > > In that context it would also be helpful to measure the overhead on > > x86 for the !empty case. > > Absolutely, I will add these comparisons. If we do notice that we end up > hurting the !empty case, would the current patch using atomic ops still > be considered? We have made sure that none of the changes in this set > affects performance on other workloads/smaller systems. Please read my last reply to the atomic ops approach. Aside of that we need numbers for a significant range of !x86. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Sat, 23 Nov 2013, Thomas Gleixner wrote: > On Fri, 22 Nov 2013, Davidlohr Bueso wrote: > So with the atomic ops you are changing that to: > > CPU 0 CPU 1 > > val = *futex; > futex_wait(futex, val); > > spin_lock(&hb->lock); > > atomic_inc(&hb->waiters); > uval = *futex; > *futex = newval; > > if (uval != val) {futex_wake(); >spin_unlock(&hb->lock);if (!atomic_read(&hb->waiters)) >return; return; > } > spin_lock((&hb->lock); > plist_add(hb, self); > spin_unlock(&hb->lock); > schedule(); wakeup_waiters(hb); > ... > > which restores the ordering guarantee, which the hash bucket lock > provided so far. Actually that's not true by design, it just happens to work. atomic_inc() on x86 is a "lock incl". The LOCK prefix just guarantees that the cache line which is affected by the INCL is locked. And it guarantees that locked operations serialize all outstanding load and store operations. But Documentation/atomic_ops.txt says about atomic_inc(): "One very important aspect of these two routines is that they DO NOT require any explicit memory barriers. They need only perform the atomic_t counter update in an SMP safe manner." So while this has a barrier on x86, it's not guaranteed. atomic_read() is a simple read. This does not guarantee anything. And if you read Documentation/atomic_ops.txt it's well documented: "*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***" So now your code melts down to: write(hb->waiters)|write(uaddr) mb|read(hb->waiters) read(uaddr) I fear you simply managed to make the window small enough that your testing was not longer able expose it. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, 2013-11-25 at 18:32 +0100, Thomas Gleixner wrote: > On Mon, 25 Nov 2013, Peter Zijlstra wrote: > > On Mon, Nov 25, 2013 at 05:23:51PM +0100, Thomas Gleixner wrote: > > > On Sat, 23 Nov 2013, Linus Torvalds wrote: > > > > > > > On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner > > > > wrote: > > > > > > > > > > Now the question is why we queue the waiter _AFTER_ reading the user > > > > > space value. The comment in the code is pretty non sensical: > > > > > > > > > >* On the other hand, we insert q and release the hash-bucket only > > > > >* after testing *uaddr. This guarantees that futex_wait() will NOT > > > > >* absorb a wakeup if *uaddr does not match the desired values > > > > >* while the syscall executes. > > > > > > > > > > There is no reason why we cannot queue _BEFORE_ reading the user space > > > > > value. We just have to dequeue in all the error handling cases, but > > > > > for the fast path it does not matter at all. > > > > > > > > > > CPU 0 CPU 1 > > > > > > > > > > val = *futex; > > > > > futex_wait(futex, val); > > > > > > > > > > spin_lock(&hb->lock); > > > > > > > > > > plist_add(hb, self); > > > > > smp_wmb(); > > > > > > > > > > uval = *futex; > > > > > *futex = newval; > > > > > futex_wake(); > > > > > > > > > > smp_rmb(); > > > > > if (plist_empty(hb)) > > > > >return; > > > > > ... > > > > > > > > This would seem to be a nicer approach indeed, without needing the > > > > extra atomics. > > > > > > I went through the issue with Peter and he noticed, that we need > > > smp_mb() in both places. That's what we have right now with the > > > spin_lock() and it is required as we need to guarantee that > > > > > > The waiter observes the change to the uaddr value after it added > > > itself to the plist > > > > > > The waker observes plist not empty if the change to uaddr was made > > > after the waiter checked the value. > > > > > > > > > write(plist)| write(futex_uaddr) > > > mb()| mb() > > > read(futex_uaddr) | read(plist) > > > > > > The spin_lock mb() on the waiter side does not help here because it > > > happpens before the write(plist) and not after it. > > > > Ah, note that spin_lock() is only a smp_mb() on x86, in general its an > > ACQUIRE barrier which is weaker than a full mb and will not suffice in > > this case even it if were in the right place. > > So now the question is whether this lockless empty check optimization > which seems to be quite nice on x86 for a particular workload will > have any negative side effects on other architectures. > > If the smp_mb() is heavy weight, then it will hurt massivly in the > case where the hash bucket is not empty, because we add the price for > the smp_mb() just for no gain. > > In that context it would also be helpful to measure the overhead on > x86 for the !empty case. Absolutely, I will add these comparisons. If we do notice that we end up hurting the !empty case, would the current patch using atomic ops still be considered? We have made sure that none of the changes in this set affects performance on other workloads/smaller systems. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, Nov 25, 2013 at 06:32:55PM +0100, Thomas Gleixner wrote: > In that context it would also be helpful to measure the overhead on > x86 for the !empty case. Yes, because mfence is by no means a cheap instruction on x86. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, 25 Nov 2013, Peter Zijlstra wrote: > On Mon, Nov 25, 2013 at 05:23:51PM +0100, Thomas Gleixner wrote: > > On Sat, 23 Nov 2013, Linus Torvalds wrote: > > > > > On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner > > > wrote: > > > > > > > > Now the question is why we queue the waiter _AFTER_ reading the user > > > > space value. The comment in the code is pretty non sensical: > > > > > > > >* On the other hand, we insert q and release the hash-bucket only > > > >* after testing *uaddr. This guarantees that futex_wait() will NOT > > > >* absorb a wakeup if *uaddr does not match the desired values > > > >* while the syscall executes. > > > > > > > > There is no reason why we cannot queue _BEFORE_ reading the user space > > > > value. We just have to dequeue in all the error handling cases, but > > > > for the fast path it does not matter at all. > > > > > > > > CPU 0 CPU 1 > > > > > > > > val = *futex; > > > > futex_wait(futex, val); > > > > > > > > spin_lock(&hb->lock); > > > > > > > > plist_add(hb, self); > > > > smp_wmb(); > > > > > > > > uval = *futex; > > > > *futex = newval; > > > > futex_wake(); > > > > > > > > smp_rmb(); > > > > if (plist_empty(hb)) > > > >return; > > > > ... > > > > > > This would seem to be a nicer approach indeed, without needing the > > > extra atomics. > > > > I went through the issue with Peter and he noticed, that we need > > smp_mb() in both places. That's what we have right now with the > > spin_lock() and it is required as we need to guarantee that > > > > The waiter observes the change to the uaddr value after it added > > itself to the plist > > > > The waker observes plist not empty if the change to uaddr was made > > after the waiter checked the value. > > > > > > write(plist)| write(futex_uaddr) > > mb()| mb() > > read(futex_uaddr) | read(plist) > > > > The spin_lock mb() on the waiter side does not help here because it > > happpens before the write(plist) and not after it. > > Ah, note that spin_lock() is only a smp_mb() on x86, in general its an > ACQUIRE barrier which is weaker than a full mb and will not suffice in > this case even it if were in the right place. So now the question is whether this lockless empty check optimization which seems to be quite nice on x86 for a particular workload will have any negative side effects on other architectures. If the smp_mb() is heavy weight, then it will hurt massivly in the case where the hash bucket is not empty, because we add the price for the smp_mb() just for no gain. In that context it would also be helpful to measure the overhead on x86 for the !empty case. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Mon, Nov 25, 2013 at 05:23:51PM +0100, Thomas Gleixner wrote: > On Sat, 23 Nov 2013, Linus Torvalds wrote: > > > On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner wrote: > > > > > > Now the question is why we queue the waiter _AFTER_ reading the user > > > space value. The comment in the code is pretty non sensical: > > > > > >* On the other hand, we insert q and release the hash-bucket only > > >* after testing *uaddr. This guarantees that futex_wait() will NOT > > >* absorb a wakeup if *uaddr does not match the desired values > > >* while the syscall executes. > > > > > > There is no reason why we cannot queue _BEFORE_ reading the user space > > > value. We just have to dequeue in all the error handling cases, but > > > for the fast path it does not matter at all. > > > > > > CPU 0 CPU 1 > > > > > > val = *futex; > > > futex_wait(futex, val); > > > > > > spin_lock(&hb->lock); > > > > > > plist_add(hb, self); > > > smp_wmb(); > > > > > > uval = *futex; > > > *futex = newval; > > > futex_wake(); > > > > > > smp_rmb(); > > > if (plist_empty(hb)) > > >return; > > > ... > > > > This would seem to be a nicer approach indeed, without needing the > > extra atomics. > > I went through the issue with Peter and he noticed, that we need > smp_mb() in both places. That's what we have right now with the > spin_lock() and it is required as we need to guarantee that > > The waiter observes the change to the uaddr value after it added > itself to the plist > > The waker observes plist not empty if the change to uaddr was made > after the waiter checked the value. > > > write(plist)| write(futex_uaddr) > mb()| mb() > read(futex_uaddr) | read(plist) > > The spin_lock mb() on the waiter side does not help here because it > happpens before the write(plist) and not after it. Ah, note that spin_lock() is only a smp_mb() on x86, in general its an ACQUIRE barrier which is weaker than a full mb and will not suffice in this case even it if were in the right place. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Sat, 23 Nov 2013, Linus Torvalds wrote: > On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner wrote: > > > > Now the question is why we queue the waiter _AFTER_ reading the user > > space value. The comment in the code is pretty non sensical: > > > >* On the other hand, we insert q and release the hash-bucket only > >* after testing *uaddr. This guarantees that futex_wait() will NOT > >* absorb a wakeup if *uaddr does not match the desired values > >* while the syscall executes. > > > > There is no reason why we cannot queue _BEFORE_ reading the user space > > value. We just have to dequeue in all the error handling cases, but > > for the fast path it does not matter at all. > > > > CPU 0 CPU 1 > > > > val = *futex; > > futex_wait(futex, val); > > > > spin_lock(&hb->lock); > > > > plist_add(hb, self); > > smp_wmb(); > > > > uval = *futex; > > *futex = newval; > > futex_wake(); > > > > smp_rmb(); > > if (plist_empty(hb)) > >return; > > ... > > This would seem to be a nicer approach indeed, without needing the > extra atomics. I went through the issue with Peter and he noticed, that we need smp_mb() in both places. That's what we have right now with the spin_lock() and it is required as we need to guarantee that The waiter observes the change to the uaddr value after it added itself to the plist The waker observes plist not empty if the change to uaddr was made after the waiter checked the value. write(plist)| write(futex_uaddr) mb()| mb() read(futex_uaddr) | read(plist) The spin_lock mb() on the waiter side does not help here because it happpens before the write(plist) and not after it. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Sat, 23 Nov 2013, Davidlohr Bueso wrote: > On Sat, 2013-11-23 at 19:46 -0800, Linus Torvalds wrote: > > On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner wrote: > > > > > > Now the question is why we queue the waiter _AFTER_ reading the user > > > space value. The comment in the code is pretty non sensical: > > > > > >* On the other hand, we insert q and release the hash-bucket only > > >* after testing *uaddr. This guarantees that futex_wait() will NOT > > >* absorb a wakeup if *uaddr does not match the desired values > > >* while the syscall executes. > > > > > > There is no reason why we cannot queue _BEFORE_ reading the user space > > > value. We just have to dequeue in all the error handling cases, but > > > for the fast path it does not matter at all. > > > > > > CPU 0 CPU 1 > > > > > > val = *futex; > > > futex_wait(futex, val); > > > > > > spin_lock(&hb->lock); > > > > > > plist_add(hb, self); > > > smp_wmb(); > > > > > > uval = *futex; > > > *futex = newval; > > > futex_wake(); > > > > > > smp_rmb(); > > > if (plist_empty(hb)) > > >return; > > > ... > > > > This would seem to be a nicer approach indeed, without needing the > > extra atomics. > > Yep, I think we can all agree that doing this optization without atomic > ops is a big plus. > > > > > Davidlohr, mind trying Thomas' approach? > > I just took a quick look and it seems pretty straightforward, but not > without some details to consider. We basically have to redo/reorder > futex_wait_setup(), which checks that uval == val, and > futex_wait_queue_me(), which adds the task to the list and blocks. Now, > both futex_wait() and futex_wait_requeue_pi() have this logic, but since > we don't use futex_wake() to wakeup tasks on pi futex_qs, I believe it's > ok to only change futex_wait(), while the order of the uval checking > doesn't matter for futex_wait_requeue_pi() so it can stay as is. There is no mechanism which prevents a futex_wake() call on the inner futex of the wait_requeue_pi mechanism. So no, we have to change both. futexes are no place for believe. Either you understand them completely or you just leave them alone. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Sat, 2013-11-23 at 19:46 -0800, Linus Torvalds wrote: > On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner wrote: > > > > Now the question is why we queue the waiter _AFTER_ reading the user > > space value. The comment in the code is pretty non sensical: > > > >* On the other hand, we insert q and release the hash-bucket only > >* after testing *uaddr. This guarantees that futex_wait() will NOT > >* absorb a wakeup if *uaddr does not match the desired values > >* while the syscall executes. > > > > There is no reason why we cannot queue _BEFORE_ reading the user space > > value. We just have to dequeue in all the error handling cases, but > > for the fast path it does not matter at all. > > > > CPU 0 CPU 1 > > > > val = *futex; > > futex_wait(futex, val); > > > > spin_lock(&hb->lock); > > > > plist_add(hb, self); > > smp_wmb(); > > > > uval = *futex; > > *futex = newval; > > futex_wake(); > > > > smp_rmb(); > > if (plist_empty(hb)) > >return; > > ... > > This would seem to be a nicer approach indeed, without needing the > extra atomics. Yep, I think we can all agree that doing this optization without atomic ops is a big plus. > > Davidlohr, mind trying Thomas' approach? I just took a quick look and it seems pretty straightforward, but not without some details to consider. We basically have to redo/reorder futex_wait_setup(), which checks that uval == val, and futex_wait_queue_me(), which adds the task to the list and blocks. Now, both futex_wait() and futex_wait_requeue_pi() have this logic, but since we don't use futex_wake() to wakeup tasks on pi futex_qs, I believe it's ok to only change futex_wait(), while the order of the uval checking doesn't matter for futex_wait_requeue_pi() so it can stay as is. The following is the general skeleton of what Thomas is probably referring to (yes, it needs comments, cleanups, refactoring, etc, etc). Futexes are easy to screw up but at least this passes a kernel build and all futextests on a large box. static int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset) { struct hrtimer_sleeper timeout, *to = NULL; struct restart_block *restart; struct futex_hash_bucket *hb; struct futex_q q = futex_q_init; int prio, ret = 0; u32 uval; if (!bitset) return -EINVAL; q.bitset = bitset; if (abs_time) { to = &timeout; hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ? CLOCK_REALTIME : CLOCK_MONOTONIC, HRTIMER_MODE_ABS); hrtimer_init_sleeper(to, current); hrtimer_set_expires_range_ns(&to->timer, *abs_time, current->timer_slack_ns); } retry: ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, VERIFY_READ); if (unlikely(ret != 0)) goto out; retry_private: hb = queue_lock(&q); prio = min(current->normal_prio, MAX_RT_PRIO); plist_node_init(&q.list, prio); plist_add(&q.list, &hb->chain); q.task = current; /* do NOT drop the lock here */ smp_wmb(); ret = get_futex_value_locked(&uval, uaddr); if (ret) { plist_del(&q.list, &hb->chain); spin_unlock(&hb->lock); ret = get_user(uval, uaddr); if (ret) goto out_put_key; if (!(flags & FLAGS_SHARED)) goto retry_private; put_futex_key(&q.key); goto retry; } if (uval != val) { plist_del(&q.list, &hb->chain); spin_unlock(&hb->lock); ret = -EWOULDBLOCK; goto out_put_key; } set_current_state(TASK_INTERRUPTIBLE); spin_unlock(&hb->lock); /* Arm the timer */ if (to) { hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS); if (!hrtimer_active(&to->timer)) to->task = NULL; } /* * If we have been removed from the hash list, then another task * has tried to wake us, and we can skip the call to schedule(). */ if (likely(!plist_node_empty(&q.list))) { /* * If the timer has already expired, current will already be * flagged for rescheduling. Only call schedule if there * is no timeout, or if it has yet to expire. */ if
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Sat, Nov 23, 2013 at 5:16 AM, Thomas Gleixner wrote: > > Now the question is why we queue the waiter _AFTER_ reading the user > space value. The comment in the code is pretty non sensical: > >* On the other hand, we insert q and release the hash-bucket only >* after testing *uaddr. This guarantees that futex_wait() will NOT >* absorb a wakeup if *uaddr does not match the desired values >* while the syscall executes. > > There is no reason why we cannot queue _BEFORE_ reading the user space > value. We just have to dequeue in all the error handling cases, but > for the fast path it does not matter at all. > > CPU 0 CPU 1 > > val = *futex; > futex_wait(futex, val); > > spin_lock(&hb->lock); > > plist_add(hb, self); > smp_wmb(); > > uval = *futex; > *futex = newval; > futex_wake(); > > smp_rmb(); > if (plist_empty(hb)) >return; > ... This would seem to be a nicer approach indeed, without needing the extra atomics. Davidlohr, mind trying Thomas' approach? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 22 Nov 2013, Davidlohr Bueso wrote: > On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote: > > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > > > In futex_wake() there is clearly no point in taking the hb->lock if > > > we know beforehand that there are no tasks to be woken. This comes > > > at the smaller cost of doing some atomic operations to keep track of > > > the list's size. > > > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > > need the serialization from locking, then afaik you can just do a > > "plist_head_empty()" without holding the lock. > > I remember this being the original approach, but after noticing some > strange behavior we quickly decided it wasn't the path. And sure enough, > I just double checked and tried the patch without atomic ops and can see > things being off: one of the futextest performance cases is stuck > blocked on a futex and I couldn't reboot the machine either -- nothing > apparent in dmesg, just not 100% functional. The thing is, we can only > avoid taking the lock only if nobody else is trying to add itself to the > list. Right. The reason is: CPU 0 CPU 1 val = *futex; futex_wait(futex, val); spin_lock(&hb->lock); uval = *futex; *futex = newval; if (uval != val) { futex_wake(); spin_unlock(&hb->lock); if (plist_empty(hb)) return; return; } plist_add(hb, self); spin_unlock(&hb->lock); schedule(); ... So the waiter on CPU0 gets queued after the user space value changed and the wakee on CPU1 returns because plist is empty. Waiter therefor waits forever. So with the atomic ops you are changing that to: CPU 0 CPU 1 val = *futex; futex_wait(futex, val); spin_lock(&hb->lock); atomic_inc(&hb->waiters); uval = *futex; *futex = newval; if (uval != val) { futex_wake(); spin_unlock(&hb->lock); if (!atomic_read(&hb->waiters)) return; return; } spin_lock((&hb->lock); plist_add(hb, self); spin_unlock(&hb->lock); schedule(); wakeup_waiters(hb); ... which restores the ordering guarantee, which the hash bucket lock provided so far. Now the question is why we queue the waiter _AFTER_ reading the user space value. The comment in the code is pretty non sensical: * On the other hand, we insert q and release the hash-bucket only * after testing *uaddr. This guarantees that futex_wait() will NOT * absorb a wakeup if *uaddr does not match the desired values * while the syscall executes. There is no reason why we cannot queue _BEFORE_ reading the user space value. We just have to dequeue in all the error handling cases, but for the fast path it does not matter at all. CPU 0 CPU 1 val = *futex; futex_wait(futex, val); spin_lock(&hb->lock); plist_add(hb, self); smp_wmb(); uval = *futex; *futex = newval; futex_wake(); smp_rmb(); if (plist_empty(hb)) return; spin_lock((&hb->lock); if (uval != val) { plist_del(self); spin_unlock(&hb->lock); return; } spin_unlock(&hb->lock); schedule(); wakeup_waiters(hb); ... Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 19:19 -0800, Davidlohr Bueso wrote: > On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote: > > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > > > In futex_wake() there is clearly no point in taking the hb->lock if > > > we know beforehand that there are no tasks to be woken. This comes > > > at the smaller cost of doing some atomic operations to keep track of > > > the list's size. > > > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > > need the serialization from locking, then afaik you can just do a > > "plist_head_empty()" without holding the lock. > > I remember this being the original approach, but after noticing some > strange behavior we quickly decided it wasn't the path. And sure enough, > I just double checked and tried the patch without atomic ops and can see > things being off: one of the futextest performance cases is stuck > blocked on a futex and I couldn't reboot the machine either -- nothing > apparent in dmesg, just not 100% functional. The thing is, we can only > avoid taking the lock only if nobody else is trying to add itself to the > list. In your usage, the worst case scenario is that you detect 0 when locking may have blocked and found a waiter. Correct? In this case, you return 0, instead of 1 (or more). This suggests to me a bug in the futextest testcase. Which test specifically hung up waiting? Futex hangs are almost always bad userspace code (my bad userspace code in this case ;-) -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > In futex_wake() there is clearly no point in taking the hb->lock if > we know beforehand that there are no tasks to be woken. This comes > at the smaller cost of doing some atomic operations to keep track of > the list's size. Specifically, increment the counter when an element is > added to the list, and decrement when it is removed. Of course, if the > counter is 0, then there are no tasks blocked on a futex. Some special > considerations: > > - increment the counter at queue_lock() as we always end up calling > queue_me() which adds the element to the list. Upon any error, > queue_unlock() is called for housekeeping, for which we decrement > to mach the increment done in queue_lock(). ^match > > - decrement the counter at __unqueue_me() to reflect when an element is > removed from the queue for wakeup related purposes. __unqueue_futex (not __unqueue_me) > @@ -999,6 +1001,10 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int > nr_wake, u32 bitset) > goto out; > > hb = hash_futex(&key); > + /* make sure we really have tasks to wakeup */ Nit, but please use proper sentence formatting for consistency with the rest of the comments in futex.c (most of them anyway). /* Make sure we really have tasks to wake up. */ Now... I'm not thrilled with adding atomics if we don't need to, especially for an optimization since the atomics themselves cause enough problems, especially across a large number of CPUs... I'll respond to Linus's thread though. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 21:40 -0800, Darren Hart wrote: > On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > > In futex_wake() there is clearly no point in taking the hb->lock if > > we know beforehand that there are no tasks to be woken. This comes > > at the smaller cost of doing some atomic operations to keep track of > > the list's size. Specifically, increment the counter when an element is > > added to the list, and decrement when it is removed. Of course, if the > > counter is 0, then there are no tasks blocked on a futex. Some special > > considerations: > > > > - increment the counter at queue_lock() as we always end up calling > > queue_me() which adds the element to the list. Upon any error, > > queue_unlock() is called for housekeeping, for which we decrement > > to mach the increment done in queue_lock(). > > > > - decrement the counter at __unqueue_me() to reflect when an element is > > removed from the queue for wakeup related purposes. > > What is the problem you are trying to solve here? Apologies, too quick on the trigger. I see plenty of detail in 0/5. Will spend some time reviewing that. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > In futex_wake() there is clearly no point in taking the hb->lock if > we know beforehand that there are no tasks to be woken. This comes > at the smaller cost of doing some atomic operations to keep track of > the list's size. Specifically, increment the counter when an element is > added to the list, and decrement when it is removed. Of course, if the > counter is 0, then there are no tasks blocked on a futex. Some special > considerations: > > - increment the counter at queue_lock() as we always end up calling > queue_me() which adds the element to the list. Upon any error, > queue_unlock() is called for housekeeping, for which we decrement > to mach the increment done in queue_lock(). > > - decrement the counter at __unqueue_me() to reflect when an element is > removed from the queue for wakeup related purposes. What is the problem you are trying to solve here? The fast-path in futexes is not calling into the kernel at all, if you've made the syscall, you've already lost. What kind of improvement are you seeing by adding this optimization? Is it worth the introduction of atomic operations into a file known to the state of California to increase the risk of liver failure? -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On 11/22/2013 08:25 PM, Linus Torvalds wrote: On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. This comes at the smaller cost of doing some atomic operations to keep track of the list's size. Hmm. Why? Afaik, you only care about "empty or not". And if you don't need the serialization from locking, then afaik you can just do a "plist_head_empty()" without holding the lock. NOTE! The "list_empty()" function is very much designed to work even without holding a lock (as long as the head itself exists reliably, of course) BUT you have to then guarantee yourself that your algorithm doesn't have any races wrt other CPU's adding an entry to the list at the same time. Not holding a lock obviously means that you are not serialized against that.. We've had problems with people doing if (!list_empty(waiters)) wake_up_list(..) because they wouldn't wake people up who just got added. But considering that your atomic counter checking has the same lack of serialization, at least the plist_head_empty() check shouldn't be any worse than that counter thing.. And doesn't need any steenking atomic ops or a new counter field. Linus Before the patch, a waker will wake up one or waiters who call spin_lock() before the waker. If we just use plist_head_empty(), we will miss waiters who have called spin_lock(), but have not yet got the lock or inserted itself into the wait list. By adding atomic counter for checking purpose, we again has a single point where any waiters who pass that point (inc atomic counter) can be woken up by a waiter who check the atomic counter after they inc it. This acts sort of like serialization without using a lock. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote: > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > > In futex_wake() there is clearly no point in taking the hb->lock if > > we know beforehand that there are no tasks to be woken. This comes > > at the smaller cost of doing some atomic operations to keep track of > > the list's size. > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > need the serialization from locking, then afaik you can just do a > "plist_head_empty()" without holding the lock. I remember this being the original approach, but after noticing some strange behavior we quickly decided it wasn't the path. And sure enough, I just double checked and tried the patch without atomic ops and can see things being off: one of the futextest performance cases is stuck blocked on a futex and I couldn't reboot the machine either -- nothing apparent in dmesg, just not 100% functional. The thing is, we can only avoid taking the lock only if nobody else is trying to add itself to the list. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, Nov 22, 2013 at 5:25 PM, Linus Torvalds wrote: > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: >> In futex_wake() there is clearly no point in taking the hb->lock if >> we know beforehand that there are no tasks to be woken. This comes >> at the smaller cost of doing some atomic operations to keep track of >> the list's size. > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > need the serialization from locking, then afaik you can just do a > "plist_head_empty()" without holding the lock. > > NOTE! > > The "list_empty()" function is very much designed to work even without > holding a lock (as long as the head itself exists reliably, of course) > BUT you have to then guarantee yourself that your algorithm doesn't > have any races wrt other CPU's adding an entry to the list at the same > time. Not holding a lock obviously means that you are not serialized > against that.. We've had problems with people doing > > if (!list_empty(waiters)) > wake_up_list(..) > > because they wouldn't wake people up who just got added. > > But considering that your atomic counter checking has the same lack of > serialization, at least the plist_head_empty() check shouldn't be any > worse than that counter thing.. And doesn't need any steenking atomic > ops or a new counter field. Hi Linus, In this patch, since we do the atomic increments before holding the hb->lock during situations where we may potentially add a task to the list, then we would only skip attempting the wake up if no other thread has already held the hb->lock and is in the process of adding a task to the list, in addition to the list being empty. Would this be enough to protect it from any race condition? Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > In futex_wake() there is clearly no point in taking the hb->lock if > we know beforehand that there are no tasks to be woken. This comes > at the smaller cost of doing some atomic operations to keep track of > the list's size. Hmm. Why? Afaik, you only care about "empty or not". And if you don't need the serialization from locking, then afaik you can just do a "plist_head_empty()" without holding the lock. NOTE! The "list_empty()" function is very much designed to work even without holding a lock (as long as the head itself exists reliably, of course) BUT you have to then guarantee yourself that your algorithm doesn't have any races wrt other CPU's adding an entry to the list at the same time. Not holding a lock obviously means that you are not serialized against that.. We've had problems with people doing if (!list_empty(waiters)) wake_up_list(..) because they wouldn't wake people up who just got added. But considering that your atomic counter checking has the same lack of serialization, at least the plist_head_empty() check shouldn't be any worse than that counter thing.. And doesn't need any steenking atomic ops or a new counter field. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. This comes at the smaller cost of doing some atomic operations to keep track of the list's size. Specifically, increment the counter when an element is added to the list, and decrement when it is removed. Of course, if the counter is 0, then there are no tasks blocked on a futex. Some special considerations: - increment the counter at queue_lock() as we always end up calling queue_me() which adds the element to the list. Upon any error, queue_unlock() is called for housekeeping, for which we decrement to mach the increment done in queue_lock(). - decrement the counter at __unqueue_me() to reflect when an element is removed from the queue for wakeup related purposes. Cc: Ingo Molnar Cc: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Cc: Jeff Mahoney Cc: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Signed-off-by: Waiman Long Signed-off-by: Jason Low Signed-off-by: Davidlohr Bueso --- kernel/futex.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/kernel/futex.c b/kernel/futex.c index 5fa9eb0..2f9dd5d 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -152,6 +152,7 @@ static const struct futex_q futex_q_init = { * waiting on a futex. */ struct futex_hash_bucket { + atomic_t len; spinlock_t lock; struct plist_head chain; }; @@ -843,6 +844,7 @@ static void __unqueue_futex(struct futex_q *q) hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock); plist_del(&q->list, &hb->chain); + atomic_dec(&hb->len); } /* @@ -999,6 +1001,10 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) goto out; hb = hash_futex(&key); + /* make sure we really have tasks to wakeup */ + if (atomic_read(&hb->len) == 0) + goto out_put_key; + spin_lock(&hb->lock); plist_for_each_entry_safe(this, next, &hb->chain, list) { @@ -1019,6 +1025,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) } spin_unlock(&hb->lock); +out_put_key: put_futex_key(&key); out: return ret; @@ -1137,7 +1144,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1, */ if (likely(&hb1->chain != &hb2->chain)) { plist_del(&q->list, &hb1->chain); + atomic_dec(&hb1->len); plist_add(&q->list, &hb2->chain); + atomic_inc(&hb2->len); q->lock_ptr = &hb2->lock; } get_futex_key_refs(key2); @@ -1480,6 +1489,8 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q) struct futex_hash_bucket *hb; hb = hash_futex(&q->key); + atomic_inc(&hb->len); + q->lock_ptr = &hb->lock; spin_lock(&hb->lock); @@ -1491,6 +1502,7 @@ queue_unlock(struct futex_hash_bucket *hb) __releases(&hb->lock) { spin_unlock(&hb->lock); + atomic_dec(&hb->len); } /** @@ -,6 +2234,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, * Unqueue the futex_q and determine which it was. */ plist_del(&q->list, &hb->chain); + atomic_dec(&hb->len); /* Handle spurious wakeups gracefully */ ret = -EWOULDBLOCK; @@ -2749,6 +2762,7 @@ static int __init futex_init(void) for (i = 0; i < futex_hashsize; i++) { plist_head_init(&futex_queues[i].chain); spin_lock_init(&futex_queues[i].lock); + atomic_set(&futex_queues[i].len, 0); } return 0; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/