Re: Futex queue_me/get_user ordering

2005-03-20 Thread Jamie Lokier
Ingo Molnar wrote:
> 
> * Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> 
> > The futex man pages that have been around for years (certainly since
> > mid 2002) certainly don't document FUTEX_WAIT as token passing
> > operation, but as atomic operation:
> > 
> > Say http://www.icewalkers.com/Linux/ManPages/futex-2.html
> 
> besides this documented-behavior argument, i dont think futexes should
> be degraded into waitqueues

I give in...

Depending on atomicity makes it impossible for an application, which
is linked with NPTL and Glibc, to write an NPTL-compatible "wait on
two locks" function.

I'm not saying that's a very clean thing to want, but it's a
conceptual loss and I'm disappointed I seem to be the only one
noticing it.

On the other hand, I was mistaken to think it makes it impossible to
write an emulation of synchronous futex() in terms of asynchronous
futex().* In fact it makes it impossible to do so using the existing
FUTEX_FD, but it would be possible if there were a FUTEX_FD2 added
somewhere down the line.

* - The reason you would do this is if you were writing userspace-threading
for any reason, and you had to include an emulation of synchronous
futex() in terms of async futex because there are some libraries
which might run on top of the userspace-threading which use futex
in an application-dependent way.

> - in fact, to solve some of the known
> performance problems the opposite will have to happen: e.g. i believe
> that in the future we'll need to enable the kernel-side futex code to
> actually modify the futex variable. I.e. atomicity of the read in
> FUTEX_WAIT is an absolute must, and is only the first step.

Some of those performance problems can be solved already by better use
of FUTEX_REQUEUE instead of FUTEX_WAKE.

> [ the double-context-switch problem in cond_signal() that Jamie
>   mentioned is precisely one such case: pthread semantics force us that
>   the wakeup of the wakee _must_ happen while still holding the internal
>   lock. So we cannot just delay the wakeup to outside the glibc critical
>   section. This double context-switch could be avoided if the 'release
>   internal lock and wake up wakee' operation could be done atomically
>   within the kernel. (A sane default 'userspace unlock' operation on a
>   machine word could be defined .. e.g. decrement-to-zero.) ]

Did you not see the solution I gave last November, using FUTEX_REQUEUE?

See:

http://lkml.org/lkml/2004/11/29/201

I spent a _lot_ of time figuring it out but everyone was too busy to
confirm that it worked.  It would improve performance in a number of cases.

I hope that it does not get ignored yet again.

There _may_ be cases where more complex futex operations are needed,
but we should try the better algorithms that use the existing futex
operations before adding new ones.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-20 Thread Jamie Lokier
Ingo Molnar wrote:
 
 * Jakub Jelinek [EMAIL PROTECTED] wrote:
 
  The futex man pages that have been around for years (certainly since
  mid 2002) certainly don't document FUTEX_WAIT as token passing
  operation, but as atomic operation:
  
  Say http://www.icewalkers.com/Linux/ManPages/futex-2.html
 
 besides this documented-behavior argument, i dont think futexes should
 be degraded into waitqueues

I give in...

Depending on atomicity makes it impossible for an application, which
is linked with NPTL and Glibc, to write an NPTL-compatible wait on
two locks function.

I'm not saying that's a very clean thing to want, but it's a
conceptual loss and I'm disappointed I seem to be the only one
noticing it.

On the other hand, I was mistaken to think it makes it impossible to
write an emulation of synchronous futex() in terms of asynchronous
futex().* In fact it makes it impossible to do so using the existing
FUTEX_FD, but it would be possible if there were a FUTEX_FD2 added
somewhere down the line.

* - The reason you would do this is if you were writing userspace-threading
for any reason, and you had to include an emulation of synchronous
futex() in terms of async futex because there are some libraries
which might run on top of the userspace-threading which use futex
in an application-dependent way.

 - in fact, to solve some of the known
 performance problems the opposite will have to happen: e.g. i believe
 that in the future we'll need to enable the kernel-side futex code to
 actually modify the futex variable. I.e. atomicity of the read in
 FUTEX_WAIT is an absolute must, and is only the first step.

Some of those performance problems can be solved already by better use
of FUTEX_REQUEUE instead of FUTEX_WAKE.

 [ the double-context-switch problem in cond_signal() that Jamie
   mentioned is precisely one such case: pthread semantics force us that
   the wakeup of the wakee _must_ happen while still holding the internal
   lock. So we cannot just delay the wakeup to outside the glibc critical
   section. This double context-switch could be avoided if the 'release
   internal lock and wake up wakee' operation could be done atomically
   within the kernel. (A sane default 'userspace unlock' operation on a
   machine word could be defined .. e.g. decrement-to-zero.) ]

Did you not see the solution I gave last November, using FUTEX_REQUEUE?

See:

http://lkml.org/lkml/2004/11/29/201

I spent a _lot_ of time figuring it out but everyone was too busy to
confirm that it worked.  It would improve performance in a number of cases.

I hope that it does not get ignored yet again.

There _may_ be cases where more complex futex operations are needed,
but we should try the better algorithms that use the existing futex
operations before adding new ones.

-- Jamie
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-18 Thread Ingo Molnar

* Jakub Jelinek <[EMAIL PROTECTED]> wrote:

> The futex man pages that have been around for years (certainly since
> mid 2002) certainly don't document FUTEX_WAIT as token passing
> operation, but as atomic operation:
> 
> Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

besides this documented-behavior argument, i dont think futexes should
be degraded into waitqueues - in fact, to solve some of the known
performance problems the opposite will have to happen: e.g. i believe
that in the future we'll need to enable the kernel-side futex code to
actually modify the futex variable. I.e. atomicity of the read in
FUTEX_WAIT is an absolute must, and is only the first step.

[ the double-context-switch problem in cond_signal() that Jamie
  mentioned is precisely one such case: pthread semantics force us that
  the wakeup of the wakee _must_ happen while still holding the internal
  lock. So we cannot just delay the wakeup to outside the glibc critical
  section. This double context-switch could be avoided if the 'release
  internal lock and wake up wakee' operation could be done atomically
  within the kernel. (A sane default 'userspace unlock' operation on a
  machine word could be defined .. e.g. decrement-to-zero.) ]

so i'm very much in favor of your patch - it fixes a real bug and is
also the right step forward. We'll need more locking code in the kernel
to remove fundamental limitations of userspace (such as no ability to
control preemption), not less.

i've tested your latest patch (from today) on x86 and it boots/works
fine with Fedora userspace, where futexes do get utilized, and ran a few
tests as well.

(Andrew - might make sense to include in the next -mm so that we get
some feel of stability, while the conceptual discussion continues?)

Ingo

--

this patch makes FUTEX_WAIT atomic again.

Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

--- linux/kernel/futex.c.orig
+++ linux/kernel/futex.c
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(>mm->mmap_sem);
@@ -354,23 +351,22 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex();
bh2 = hash_futex();
 
-   nqueued = bh1->nqueued;
+   if (bh1 < bh2)
+   spin_lock(>lock);
+   spin_lock(>lock);
+   if (bh1 > bh2)
+   spin_lock(>lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1->lock and bh2->lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1->nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(>lock);
+   if (bh1 != bh2)
+   spin_unlock(>lock);
+
/* If we would have faulted, release mmap_sem, fault
 * it in and start all over again.
 */
@@ -385,21 +381,10 @@ static int futex_requeue(unsigned long u
}
if (curval != *valp) {
ret = -EAGAIN;
-   goto out;
+   goto out_unlock;
}
}
 
-   if (bh1 < bh2)
-   spin_lock(>lock);
-   spin_lock(>lock);
-   if (bh1 > bh2)
-   spin_lock(>lock);
-
-   if (unlikely(nqueued != bh1->nqueued && valp != NULL)) {
-   ret = -EAGAIN;
-   goto out_unlock;
-   }
-
head1 = >chain;
list_for_each_entry_safe(this, next, head1, list) {
if (!match_futex (>key, ))
@@ -435,13 +420,9 @@ out:
return ret;
 }
 
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once.  They are called with the hashed spinlock held.
- */
-
 /* The key must be already stored in q->key. */
-static void queue_me(struct futex_q *q, int fd, struct file *filp)
+static inline struct futex_hash_bucket *
+queue_lock(struct 

Re: Futex queue_me/get_user ordering

2005-03-18 Thread Jakub Jelinek
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote:
> > [numerous instances of...]
> > +   preempt_check_resched();
> 
> Not required.  The spin unlocks will do this.

Here is updated patch with those removed (all of them are preceeded
by spin_unlock) and out_unqueue label and following unused code removed
too.

--- linux-2.6.11/kernel/futex.c.jj  2005-03-17 04:42:29.0 -0500
+++ linux-2.6.11/kernel/futex.c 2005-03-18 05:45:29.0 -0500
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(>mm->mmap_sem);
@@ -354,23 +351,22 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex();
bh2 = hash_futex();
 
-   nqueued = bh1->nqueued;
+   if (bh1 < bh2)
+   spin_lock(>lock);
+   spin_lock(>lock);
+   if (bh1 > bh2)
+   spin_lock(>lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1->lock and bh2->lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1->nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(>lock);
+   if (bh1 != bh2)
+   spin_unlock(>lock);
+
/* If we would have faulted, release mmap_sem, fault
 * it in and start all over again.
 */
@@ -385,21 +381,10 @@ static int futex_requeue(unsigned long u
}
if (curval != *valp) {
ret = -EAGAIN;
-   goto out;
+   goto out_unlock;
}
}
 
-   if (bh1 < bh2)
-   spin_lock(>lock);
-   spin_lock(>lock);
-   if (bh1 > bh2)
-   spin_lock(>lock);
-
-   if (unlikely(nqueued != bh1->nqueued && valp != NULL)) {
-   ret = -EAGAIN;
-   goto out_unlock;
-   }
-
head1 = >chain;
list_for_each_entry_safe(this, next, head1, list) {
if (!match_futex (>key, ))
@@ -435,13 +420,9 @@ out:
return ret;
 }
 
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once.  They are called with the hashed spinlock held.
- */
-
 /* The key must be already stored in q->key. */
-static void queue_me(struct futex_q *q, int fd, struct file *filp)
+static inline struct futex_hash_bucket *
+queue_lock(struct futex_q *q, int fd, struct file *filp)
 {
struct futex_hash_bucket *bh;
 
@@ -455,11 +436,35 @@ static void queue_me(struct futex_q *q, 
q->lock_ptr = >lock;
 
spin_lock(>lock);
-   bh->nqueued++;
+   return bh;
+}
+
+static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *bh)
+{
list_add_tail(>list, >chain);
spin_unlock(>lock);
 }
 
+static inline void
+queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh)
+{
+   spin_unlock(>lock);
+   drop_key_refs(>key);
+}
+
+/*
+ * queue_me and unqueue_me must be called as a pair, each
+ * exactly once.  They are called with the hashed spinlock held.
+ */
+
+/* The key must be already stored in q->key. */
+static void queue_me(struct futex_q *q, int fd, struct file *filp)
+{
+   struct futex_hash_bucket *bh;
+   bh = queue_lock(q, fd, filp);
+   __queue_me(q, bh);
+}
+
 /* Return 1 if we were still queued (ie. 0 means we were woken) */
 static int unqueue_me(struct futex_q *q)
 {
@@ -503,6 +508,7 @@ static int futex_wait(unsigned long uadd
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
struct futex_q q;
+   struct futex_hash_bucket *bh;
 
  retry:
down_read(>mm->mmap_sem);
@@ -511,7 +517,7 @@ static int futex_wait(unsigned long uadd
if (unlikely(ret != 0))
goto out_release_sem;
 
-   queue_me(, -1, NULL);
+   bh = queue_lock(, -1, NULL);
 
/*
 * Access the page AFTER the futex is 

Re: Futex queue_me/get_user ordering

2005-03-18 Thread Jakub Jelinek
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote:
  [numerous instances of...]
  +   preempt_check_resched();
 
 Not required.  The spin unlocks will do this.

Here is updated patch with those removed (all of them are preceeded
by spin_unlock) and out_unqueue label and following unused code removed
too.

--- linux-2.6.11/kernel/futex.c.jj  2005-03-17 04:42:29.0 -0500
+++ linux-2.6.11/kernel/futex.c 2005-03-18 05:45:29.0 -0500
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(current-mm-mmap_sem);
@@ -354,23 +351,22 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex(key1);
bh2 = hash_futex(key2);
 
-   nqueued = bh1-nqueued;
+   if (bh1  bh2)
+   spin_lock(bh1-lock);
+   spin_lock(bh2-lock);
+   if (bh1  bh2)
+   spin_lock(bh1-lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1-lock and bh2-lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1-nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(curval, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(bh1-lock);
+   if (bh1 != bh2)
+   spin_unlock(bh2-lock);
+
/* If we would have faulted, release mmap_sem, fault
 * it in and start all over again.
 */
@@ -385,21 +381,10 @@ static int futex_requeue(unsigned long u
}
if (curval != *valp) {
ret = -EAGAIN;
-   goto out;
+   goto out_unlock;
}
}
 
-   if (bh1  bh2)
-   spin_lock(bh1-lock);
-   spin_lock(bh2-lock);
-   if (bh1  bh2)
-   spin_lock(bh1-lock);
-
-   if (unlikely(nqueued != bh1-nqueued  valp != NULL)) {
-   ret = -EAGAIN;
-   goto out_unlock;
-   }
-
head1 = bh1-chain;
list_for_each_entry_safe(this, next, head1, list) {
if (!match_futex (this-key, key1))
@@ -435,13 +420,9 @@ out:
return ret;
 }
 
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once.  They are called with the hashed spinlock held.
- */
-
 /* The key must be already stored in q-key. */
-static void queue_me(struct futex_q *q, int fd, struct file *filp)
+static inline struct futex_hash_bucket *
+queue_lock(struct futex_q *q, int fd, struct file *filp)
 {
struct futex_hash_bucket *bh;
 
@@ -455,11 +436,35 @@ static void queue_me(struct futex_q *q, 
q-lock_ptr = bh-lock;
 
spin_lock(bh-lock);
-   bh-nqueued++;
+   return bh;
+}
+
+static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *bh)
+{
list_add_tail(q-list, bh-chain);
spin_unlock(bh-lock);
 }
 
+static inline void
+queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh)
+{
+   spin_unlock(bh-lock);
+   drop_key_refs(q-key);
+}
+
+/*
+ * queue_me and unqueue_me must be called as a pair, each
+ * exactly once.  They are called with the hashed spinlock held.
+ */
+
+/* The key must be already stored in q-key. */
+static void queue_me(struct futex_q *q, int fd, struct file *filp)
+{
+   struct futex_hash_bucket *bh;
+   bh = queue_lock(q, fd, filp);
+   __queue_me(q, bh);
+}
+
 /* Return 1 if we were still queued (ie. 0 means we were woken) */
 static int unqueue_me(struct futex_q *q)
 {
@@ -503,6 +508,7 @@ static int futex_wait(unsigned long uadd
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
struct futex_q q;
+   struct futex_hash_bucket *bh;
 
  retry:
down_read(current-mm-mmap_sem);
@@ -511,7 +517,7 @@ static int futex_wait(unsigned long uadd
if (unlikely(ret != 0))
goto out_release_sem;
 
-   queue_me(q, -1, NULL);
+   bh = queue_lock(q, -1, NULL);
 
   

Re: Futex queue_me/get_user ordering

2005-03-18 Thread Ingo Molnar

* Jakub Jelinek [EMAIL PROTECTED] wrote:

 The futex man pages that have been around for years (certainly since
 mid 2002) certainly don't document FUTEX_WAIT as token passing
 operation, but as atomic operation:
 
 Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

besides this documented-behavior argument, i dont think futexes should
be degraded into waitqueues - in fact, to solve some of the known
performance problems the opposite will have to happen: e.g. i believe
that in the future we'll need to enable the kernel-side futex code to
actually modify the futex variable. I.e. atomicity of the read in
FUTEX_WAIT is an absolute must, and is only the first step.

[ the double-context-switch problem in cond_signal() that Jamie
  mentioned is precisely one such case: pthread semantics force us that
  the wakeup of the wakee _must_ happen while still holding the internal
  lock. So we cannot just delay the wakeup to outside the glibc critical
  section. This double context-switch could be avoided if the 'release
  internal lock and wake up wakee' operation could be done atomically
  within the kernel. (A sane default 'userspace unlock' operation on a
  machine word could be defined .. e.g. decrement-to-zero.) ]

so i'm very much in favor of your patch - it fixes a real bug and is
also the right step forward. We'll need more locking code in the kernel
to remove fundamental limitations of userspace (such as no ability to
control preemption), not less.

i've tested your latest patch (from today) on x86 and it boots/works
fine with Fedora userspace, where futexes do get utilized, and ran a few
tests as well.

(Andrew - might make sense to include in the next -mm so that we get
some feel of stability, while the conceptual discussion continues?)

Ingo

--

this patch makes FUTEX_WAIT atomic again.

Signed-off-by: Jakub Jelinek [EMAIL PROTECTED]
Acked-by: Ingo Molnar [EMAIL PROTECTED]

--- linux/kernel/futex.c.orig
+++ linux/kernel/futex.c
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(current-mm-mmap_sem);
@@ -354,23 +351,22 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex(key1);
bh2 = hash_futex(key2);
 
-   nqueued = bh1-nqueued;
+   if (bh1  bh2)
+   spin_lock(bh1-lock);
+   spin_lock(bh2-lock);
+   if (bh1  bh2)
+   spin_lock(bh1-lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1-lock and bh2-lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1-nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(curval, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(bh1-lock);
+   if (bh1 != bh2)
+   spin_unlock(bh2-lock);
+
/* If we would have faulted, release mmap_sem, fault
 * it in and start all over again.
 */
@@ -385,21 +381,10 @@ static int futex_requeue(unsigned long u
}
if (curval != *valp) {
ret = -EAGAIN;
-   goto out;
+   goto out_unlock;
}
}
 
-   if (bh1  bh2)
-   spin_lock(bh1-lock);
-   spin_lock(bh2-lock);
-   if (bh1  bh2)
-   spin_lock(bh1-lock);
-
-   if (unlikely(nqueued != bh1-nqueued  valp != NULL)) {
-   ret = -EAGAIN;
-   goto out_unlock;
-   }
-
head1 = bh1-chain;
list_for_each_entry_safe(this, next, head1, list) {
if (!match_futex (this-key, key1))
@@ -435,13 +420,9 @@ out:
return ret;
 }
 
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once.  They are called with the hashed spinlock held.
- */
-
 /* The key must be already stored in q-key. */
-static void queue_me(struct futex_q *q, int fd, struct file *filp)
+static inline struct 

Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jakub Jelinek
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote:
> If you change futex_wait to be "atomic", and then have userspace locks
> which _depend_ on that atomicity, it becomes impossible to wait on
> multiple of those locks, or make poll-driven state machines which can
> wait on those locks.

The futex man pages that have been around for years (certainly since mid 2002)
certainly don't document FUTEX_WAIT as token passing operation, but as atomic
operation:

Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

FUTEX_WAIT
This operation atomically verifies that  the  futex
address  still contains the value given, and sleeps
awaiting FUTEX_WAKE on this futex address.  If  the
timeout argument is non-NULL, its contents describe
the maximum duration of the wait, which is infinite
otherwise.   For futex(4), this call is executed if
decrementing the count gave a negative value (indi
cating  contention),  and  will sleep until another
process   releases  the  futex  and  executes   the
FUTEX_WAKE operation.

RETURN VALUE
FUTEX_WAIT
Returns 0 if the process was woken by a  FUTEX_WAKE
call. In case of timeout, ETIMEDOUT is returned. If
the futex was not equal to the expected value,  the
operation  returns  EWOULDBLOCK.  Signals (or other
spurious wakeups) cause FUTEX_WAIT to return EINTR.

so there very well might be programs other than glibc that
depend on this behaviour.  Given that in most cases the race
is not hit every day (after all, we have been living with it for
several years), they probably wouldn't know there is a problem
like that.

> You can do userspace threading and simulate most blocking system calls
> by making them non-blocking and using poll).

Sure, but then you need to write your own locking as well and
can just use the token passing property of futexes there.

> It's not a _huge_ loss, but considering it's only Glibc which is
> demanding this and futexes have another property, token-passing, which
> Glibc could be using instead - why not use it?

Because that requires requeue being done with the cv lock held, which
means an extra context switch.

> > @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
> > inc_preempt_count();
> > ret = __copy_from_user_inatomic(dest, from, sizeof(int));
> > dec_preempt_count();
> > -   preempt_check_resched();
> >  
> > return ret ? -EFAULT : 0;
> >  }
> 
> inc_preempt_count() and dec_preempt_count() aren't needed, as
> preemption is disabled by the queue spinlocks.  So
> get_futex_value_locked isn't needed any more: with the spinlocks held,
> __get_user will do.

They aren't needed if CONFIG_PREEMPT.  But with !CONFIG_PREEMPT, they
are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable},
which is a nop if !CONFIG_PREEMPT.
__get_user can't be used though, it should be __get_user_inatomic
(or __copy_from_user_inatomic if the former doesn't exist).

> > [numerous instances of...]
> > +   preempt_check_resched();
> 
> Not required.  The spin unlocks will do this.

True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for
CONFIG_PREEMPT spin_unlock will handle it.  Will remove them from the
patch.

> > But with the recent changes to futex.c I think kernel can ensure
> > atomicity for free.
> 
> I agree it would probably not slow the kernel, but I would _strongly_
> prefer that Glibc were fixed to use the token-passing property, if
> Glibc is the driving intention behind this patch - instead of this
> becoming a semantic that application-level users of futex (like
> database and IPC libraries) come to depend on and which can't be
> decomposed into a multiple-waiting form.
> 
> (I admit that the kernel code does look nicer with
> get_futex_value_locked gone, though).
> 
> By the way, do you know of Scott Snyder's recent work on fixing Glibc
> in this way?  He bumped into one of Glibc's currently broken corner
> cases, fixed it (according to the algorithm I gave in November), and
> reported that it works fine with the fix.

I certainly haven't seen his patch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jamie Lokier
Jakub Jelinek wrote:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html
> 
> Your argument in November was that you don't want to slow down the
> kernel and that userland must be able to cope with the
> non-atomicity of futex syscall.

Those were two of them.

But my other main concern is conceptual.

Right now, a futex_wait call is roughly equivalent to to
add_wait_queue, which is quite versatile.

It means anything you can do with one futex, you can extend to
multiple futexes (e.g. waiting on more than one lock), and you can do
asynchronously (e.g. futex_wait can be implemented in userspace as
futex_fd[1] + poll[2], and therefore things like poll-driven state machines
where one of the state machines wants to wait on a lock are possible).

[1] Ulrich was mistaken in his paper to say futex_fd needs to check a word
to be useful; userspace is supposed to check the word after futex_fd
and before polling or waiting on it.  This is more useful because it
extends to multiple futexes.
[2] actually it can't right now because of a flaw in futex_fd's poll
function, but that could be fixed.  The _principle_ is sound.

If you change futex_wait to be "atomic", and then have userspace locks
which _depend_ on that atomicity, it becomes impossible to wait on
multiple of those locks, or make poll-driven state machines which can
wait on those locks.

There are applications and libraries which use futex, not just for
threading but things like database locks in files.

You can do userspace threading and simulate most blocking system calls
by making them non-blocking and using poll).

(I'm not saying anything against NPTL by this, by the way - NPTL is a
very good general purpose library - but there are occasions when an
application wants to do it's own equivalent of simulated blocking
system calls for one reason or another.  My favourite being research
into inter-thread JIT-optimisation in an environment like valgrind).

Right now, in principle, futex_wait is among the system calls which
can be simulated by making it non-blocking (= futex_fd) and using poll()[2].
Which means programs using futex themselves can be subject to interesting
thread optimisations by code which knows nothing about the program
(similar to valgrind..)

If you change futex_wait to be "atomic", then it would be _impossible_
to take a some random 3rd party library which is using that
futex_wait, and convert it's blocking system calls to use poll-driven
state machines instead.

I think taking that away would be a great conceptual loss.

It's not a _huge_ loss, but considering it's only Glibc which is
demanding this and futexes have another property, token-passing, which
Glibc could be using instead - why not use it?

That said, let's look at your patch.

> It would simplify requeue implementation (getting rid of the nqueued
> field),

The change to FUTEX_REQUEUE2 is an improvement :)
nqueued is an abomination, like the rest of FUTEX_REQUEUE2 :)

> @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
>   inc_preempt_count();
>   ret = __copy_from_user_inatomic(dest, from, sizeof(int));
>   dec_preempt_count();
> - preempt_check_resched();
>  
>   return ret ? -EFAULT : 0;
>  }

inc_preempt_count() and dec_preempt_count() aren't needed, as
preemption is disabled by the queue spinlocks.  So
get_futex_value_locked isn't needed any more: with the spinlocks held,
__get_user will do.

> [numerous instances of...]
> + preempt_check_resched();

Not required.  The spin unlocks will do this.

> But with the recent changes to futex.c I think kernel can ensure
> atomicity for free.

I agree it would probably not slow the kernel, but I would _strongly_
prefer that Glibc were fixed to use the token-passing property, if
Glibc is the driving intention behind this patch - instead of this
becoming a semantic that application-level users of futex (like
database and IPC libraries) come to depend on and which can't be
decomposed into a multiple-waiting form.

(I admit that the kernel code does look nicer with
get_futex_value_locked gone, though).

By the way, do you know of Scott Snyder's recent work on fixing Glibc
in this way?  He bumped into one of Glibc's currently broken corner
cases, fixed it (according to the algorithm I gave in November), and
reported that it works fine with the fix.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jakub Jelinek
On Thu, Nov 18, 2004 at 02:47:26PM -0500, Jakub Jelinek wrote:
> The scenario described in futex_wait-fix.patch IMHO can happen even
> if all calls to pthread_cond_signal are done with mutex held around it, i.e.
> A B   X   Y
> pthread_mutex_lock ();
> pthread_cond_wait (, );
>   - mtx release *)
>   total++ [1/0/0] (0) {}
>   pthread_mutex_lock ();
>   pthread_cond_signal ();
> - wake++ [1/1/0] (1) {}
> FUTEX_WAKE, 1 (returns, nothing is queued)
>   pthread_mutex_unlock ();
>   pthread_mutex_lock ();
>   pthread_cond_wait (, );
> - mtx release *)
> total++ [2/1/0] (1) {}
>   FUTEX_WAIT, 0
>   queue_me [2/1/0] (1) {A}
>   0 != 1
>   FUTEX_WAIT, 1
>   queue_me [2/1/0] (1) {A,B}
>   1 == 1
>   pthread_mutex_lock ();
>   pthread_cond_signal ();
> - wake++ [2/2/0] (2) {A,B}
> FUTEX_WAKE, 1 (unqueues 
> incorrectly A)
> [2/2/0] (2) {B}
>   pthread_mutex_unlock ();
>   try to dequeue but already dequeued
>   would normally return EWOULDBLOCK here
>   but as unqueue_me failed, returns 0
>   woken++ [2/2/1] (2) {B}
>   schedule_timeout (forever)
>   - mtx reacquire
>   pthread_cond_wait returns
> pthread_mutex_unlock ();
> 
>   ---
>   the code would like to say pthread_mutex_unlock ();
>   and pthread_exit here, but never reaches there.
...

http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html

Your argument in November was that you don't want to slow down the
kernel and that userland must be able to cope with the
non-atomicity of futex syscall.

But with the recent changes to futex.c I think kernel can ensure
atomicity for free.

With get_futex_value_locked doing the user access in_atomic () and
repeating if that failed, I think it would be just a matter of
something as in the patch below (totally untested though).
It would simplify requeue implementation (getting rid of the nqueued
field), as well as never enqueue a futex in futex_wait until
the *uaddr == val uaccess check has shown it should be enqueued.
And I don't think the kernel will be any slower because of that,
in the common case where get_futex_value_locked does not cause
a mm fault (userland typically accessed that memory a few cycles before
the syscall), the futex_wait change is just about doing first half of
queue_me before the user access and second half after it.

--- linux-2.6.11/kernel/futex.c.jj  2005-03-17 04:42:29.0 -0500
+++ linux-2.6.11/kernel/futex.c 2005-03-17 05:13:45.0 -0500
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(>mm->mmap_sem);
@@ -354,23 +351,24 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex();
bh2 = hash_futex();
 
-   nqueued = bh1->nqueued;
+   if (bh1 < bh2)
+   spin_lock(>lock);
+   spin_lock(>lock);
+   if (bh1 > bh2)
+   spin_lock(>lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1->lock and bh2->lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1->nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(>lock);
+   if (bh1 != bh2)
+   spin_unlock(>lock);
+
+   preempt_check_resched();
+
/* If we would have faulted, release mmap_sem, fault
 * it in and 

Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jakub Jelinek
On Thu, Nov 18, 2004 at 02:47:26PM -0500, Jakub Jelinek wrote:
 The scenario described in futex_wait-fix.patch IMHO can happen even
 if all calls to pthread_cond_signal are done with mutex held around it, i.e.
 A B   X   Y
 pthread_mutex_lock (mtx);
 pthread_cond_wait (cv, mtx);
   - mtx release *)
   total++ [1/0/0] (0) {}
   pthread_mutex_lock (mtx);
   pthread_cond_signal (cv);
 - wake++ [1/1/0] (1) {}
 FUTEX_WAKE, 1 (returns, nothing is queued)
   pthread_mutex_unlock (mtx);
   pthread_mutex_lock (mtx);
   pthread_cond_wait (cv, mtx);
 - mtx release *)
 total++ [2/1/0] (1) {}
   FUTEX_WAIT, 0
   queue_me [2/1/0] (1) {A}
   0 != 1
   FUTEX_WAIT, 1
   queue_me [2/1/0] (1) {A,B}
   1 == 1
   pthread_mutex_lock (mtx);
   pthread_cond_signal (cv);
 - wake++ [2/2/0] (2) {A,B}
 FUTEX_WAKE, 1 (unqueues 
 incorrectly A)
 [2/2/0] (2) {B}
   pthread_mutex_unlock (mtx);
   try to dequeue but already dequeued
   would normally return EWOULDBLOCK here
   but as unqueue_me failed, returns 0
   woken++ [2/2/1] (2) {B}
   schedule_timeout (forever)
   - mtx reacquire
   pthread_cond_wait returns
 pthread_mutex_unlock (mtx);
 
   ---
   the code would like to say pthread_mutex_unlock (mtx);
   and pthread_exit here, but never reaches there.
...

http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html

Your argument in November was that you don't want to slow down the
kernel and that userland must be able to cope with the
non-atomicity of futex syscall.

But with the recent changes to futex.c I think kernel can ensure
atomicity for free.

With get_futex_value_locked doing the user access in_atomic () and
repeating if that failed, I think it would be just a matter of
something as in the patch below (totally untested though).
It would simplify requeue implementation (getting rid of the nqueued
field), as well as never enqueue a futex in futex_wait until
the *uaddr == val uaccess check has shown it should be enqueued.
And I don't think the kernel will be any slower because of that,
in the common case where get_futex_value_locked does not cause
a mm fault (userland typically accessed that memory a few cycles before
the syscall), the futex_wait change is just about doing first half of
queue_me before the user access and second half after it.

--- linux-2.6.11/kernel/futex.c.jj  2005-03-17 04:42:29.0 -0500
+++ linux-2.6.11/kernel/futex.c 2005-03-17 05:13:45.0 -0500
@@ -97,7 +97,6 @@ struct futex_q {
  */
 struct futex_hash_bucket {
spinlock_t  lock;
-   unsigned intnqueued;
struct list_head   chain;
 };
 
@@ -265,7 +264,6 @@ static inline int get_futex_value_locked
inc_preempt_count();
ret = __copy_from_user_inatomic(dest, from, sizeof(int));
dec_preempt_count();
-   preempt_check_resched();
 
return ret ? -EFAULT : 0;
 }
@@ -339,7 +337,6 @@ static int futex_requeue(unsigned long u
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
-   unsigned int nqueued;
 
  retry:
down_read(current-mm-mmap_sem);
@@ -354,23 +351,24 @@ static int futex_requeue(unsigned long u
bh1 = hash_futex(key1);
bh2 = hash_futex(key2);
 
-   nqueued = bh1-nqueued;
+   if (bh1  bh2)
+   spin_lock(bh1-lock);
+   spin_lock(bh2-lock);
+   if (bh1  bh2)
+   spin_lock(bh1-lock);
+
if (likely(valp != NULL)) {
int curval;
 
-   /* In order to avoid doing get_user while
-  holding bh1-lock and bh2-lock, nqueued
-  (monotonically increasing field) must be first
-  read, then *uaddr1 fetched from userland and
-  after acquiring lock nqueued field compared with
-  the stored value.  The smp_mb () below
-  makes sure that bh1-nqueued is read from memory
-  before *uaddr1.  */
-   smp_mb();
-
ret = get_futex_value_locked(curval, (int __user *)uaddr1);
 
if (unlikely(ret)) {
+   spin_unlock(bh1-lock);
+   if (bh1 != bh2)
+   spin_unlock(bh2-lock);
+
+   preempt_check_resched();
+
/* If we would have faulted, release mmap_sem, fault

Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jamie Lokier
Jakub Jelinek wrote:
 http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html
 
 Your argument in November was that you don't want to slow down the
 kernel and that userland must be able to cope with the
 non-atomicity of futex syscall.

Those were two of them.

But my other main concern is conceptual.

Right now, a futex_wait call is roughly equivalent to to
add_wait_queue, which is quite versatile.

It means anything you can do with one futex, you can extend to
multiple futexes (e.g. waiting on more than one lock), and you can do
asynchronously (e.g. futex_wait can be implemented in userspace as
futex_fd[1] + poll[2], and therefore things like poll-driven state machines
where one of the state machines wants to wait on a lock are possible).

[1] Ulrich was mistaken in his paper to say futex_fd needs to check a word
to be useful; userspace is supposed to check the word after futex_fd
and before polling or waiting on it.  This is more useful because it
extends to multiple futexes.
[2] actually it can't right now because of a flaw in futex_fd's poll
function, but that could be fixed.  The _principle_ is sound.

If you change futex_wait to be atomic, and then have userspace locks
which _depend_ on that atomicity, it becomes impossible to wait on
multiple of those locks, or make poll-driven state machines which can
wait on those locks.

There are applications and libraries which use futex, not just for
threading but things like database locks in files.

You can do userspace threading and simulate most blocking system calls
by making them non-blocking and using poll).

(I'm not saying anything against NPTL by this, by the way - NPTL is a
very good general purpose library - but there are occasions when an
application wants to do it's own equivalent of simulated blocking
system calls for one reason or another.  My favourite being research
into inter-thread JIT-optimisation in an environment like valgrind).

Right now, in principle, futex_wait is among the system calls which
can be simulated by making it non-blocking (= futex_fd) and using poll()[2].
Which means programs using futex themselves can be subject to interesting
thread optimisations by code which knows nothing about the program
(similar to valgrind..)

If you change futex_wait to be atomic, then it would be _impossible_
to take a some random 3rd party library which is using that
futex_wait, and convert it's blocking system calls to use poll-driven
state machines instead.

I think taking that away would be a great conceptual loss.

It's not a _huge_ loss, but considering it's only Glibc which is
demanding this and futexes have another property, token-passing, which
Glibc could be using instead - why not use it?

That said, let's look at your patch.

 It would simplify requeue implementation (getting rid of the nqueued
 field),

The change to FUTEX_REQUEUE2 is an improvement :)
nqueued is an abomination, like the rest of FUTEX_REQUEUE2 :)

 @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
   inc_preempt_count();
   ret = __copy_from_user_inatomic(dest, from, sizeof(int));
   dec_preempt_count();
 - preempt_check_resched();
  
   return ret ? -EFAULT : 0;
  }

inc_preempt_count() and dec_preempt_count() aren't needed, as
preemption is disabled by the queue spinlocks.  So
get_futex_value_locked isn't needed any more: with the spinlocks held,
__get_user will do.

 [numerous instances of...]
 + preempt_check_resched();

Not required.  The spin unlocks will do this.

 But with the recent changes to futex.c I think kernel can ensure
 atomicity for free.

I agree it would probably not slow the kernel, but I would _strongly_
prefer that Glibc were fixed to use the token-passing property, if
Glibc is the driving intention behind this patch - instead of this
becoming a semantic that application-level users of futex (like
database and IPC libraries) come to depend on and which can't be
decomposed into a multiple-waiting form.

(I admit that the kernel code does look nicer with
get_futex_value_locked gone, though).

By the way, do you know of Scott Snyder's recent work on fixing Glibc
in this way?  He bumped into one of Glibc's currently broken corner
cases, fixed it (according to the algorithm I gave in November), and
reported that it works fine with the fix.

-- Jamie
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Futex queue_me/get_user ordering

2005-03-17 Thread Jakub Jelinek
On Thu, Mar 17, 2005 at 03:20:31PM +, Jamie Lokier wrote:
 If you change futex_wait to be atomic, and then have userspace locks
 which _depend_ on that atomicity, it becomes impossible to wait on
 multiple of those locks, or make poll-driven state machines which can
 wait on those locks.

The futex man pages that have been around for years (certainly since mid 2002)
certainly don't document FUTEX_WAIT as token passing operation, but as atomic
operation:

Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

FUTEX_WAIT
This operation atomically verifies that  the  futex
address  still contains the value given, and sleeps
awaiting FUTEX_WAKE on this futex address.  If  the
timeout argument is non-NULL, its contents describe
the maximum duration of the wait, which is infinite
otherwise.   For futex(4), this call is executed if
decrementing the count gave a negative value (indi
cating  contention),  and  will sleep until another
process   releases  the  futex  and  executes   the
FUTEX_WAKE operation.

RETURN VALUE
FUTEX_WAIT
Returns 0 if the process was woken by a  FUTEX_WAKE
call. In case of timeout, ETIMEDOUT is returned. If
the futex was not equal to the expected value,  the
operation  returns  EWOULDBLOCK.  Signals (or other
spurious wakeups) cause FUTEX_WAIT to return EINTR.

so there very well might be programs other than glibc that
depend on this behaviour.  Given that in most cases the race
is not hit every day (after all, we have been living with it for
several years), they probably wouldn't know there is a problem
like that.

 You can do userspace threading and simulate most blocking system calls
 by making them non-blocking and using poll).

Sure, but then you need to write your own locking as well and
can just use the token passing property of futexes there.

 It's not a _huge_ loss, but considering it's only Glibc which is
 demanding this and futexes have another property, token-passing, which
 Glibc could be using instead - why not use it?

Because that requires requeue being done with the cv lock held, which
means an extra context switch.

  @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
  inc_preempt_count();
  ret = __copy_from_user_inatomic(dest, from, sizeof(int));
  dec_preempt_count();
  -   preempt_check_resched();
   
  return ret ? -EFAULT : 0;
   }
 
 inc_preempt_count() and dec_preempt_count() aren't needed, as
 preemption is disabled by the queue spinlocks.  So
 get_futex_value_locked isn't needed any more: with the spinlocks held,
 __get_user will do.

They aren't needed if CONFIG_PREEMPT.  But with !CONFIG_PREEMPT, they
are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable},
which is a nop if !CONFIG_PREEMPT.
__get_user can't be used though, it should be __get_user_inatomic
(or __copy_from_user_inatomic if the former doesn't exist).

  [numerous instances of...]
  +   preempt_check_resched();
 
 Not required.  The spin unlocks will do this.

True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for
CONFIG_PREEMPT spin_unlock will handle it.  Will remove them from the
patch.

  But with the recent changes to futex.c I think kernel can ensure
  atomicity for free.
 
 I agree it would probably not slow the kernel, but I would _strongly_
 prefer that Glibc were fixed to use the token-passing property, if
 Glibc is the driving intention behind this patch - instead of this
 becoming a semantic that application-level users of futex (like
 database and IPC libraries) come to depend on and which can't be
 decomposed into a multiple-waiting form.
 
 (I admit that the kernel code does look nicer with
 get_futex_value_locked gone, though).
 
 By the way, do you know of Scott Snyder's recent work on fixing Glibc
 in this way?  He bumped into one of Glibc's currently broken corner
 cases, fixed it (according to the algorithm I gave in November), and
 reported that it works fine with the fix.

I certainly haven't seen his patch.

Jakub
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/