Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-05 Thread Ingo Molnar

* Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> Just a minor nit. Can we please use "futex" instead of "fu" ? I'm just 
> envisioning the next union member named "ba".
> 
> Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>
> 
> Please apply with the s/fu/futex/ change. This needs to go into stable 
> .22/.23 as well.

ok, i've tied it all up, collected the Acks and will push Steve's fix to 
Linus via the scheduler git tree - is that fine with everyone? I think 
there's enough time to get this tested - Linus's latest tree shows up in 
Fedora rawhide in 1-2 days and tons of apps use futexes.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread David Holmes - Sun Microsystems

Thanks for clarifying that Linus.

Regards,
David Holmes

Linus Torvalds said the following on  5/12/07 04:06 PM:


On Wed, 5 Dec 2007, David Holmes - Sun Microsystems wrote:

While this was observed with process control signals, my concern was that
other signals might cause pthread_cond_timedwait to return immediately in the
same way. The test program allows for SIGUSR1 and SIGRTMIN testing as well,
but these other signals did not cause the immediate return. But it would seem
from Steven's analysis that this is just a fortuitous result. If I understand
things correctly, any interruption of pthread_cond_timedwait by a signal,
could result in waiting until an arbitrary time - depending on how the stack
value was corrupted. Is that correct?


No, very few things can actually cause the restart_block path to be taken. 
An actual signal execution would turn that into an EINTR, the only case 
that should ever trigger this is a signal that causes some kernel action 
(ie the system call *is* interrupted), but does not actually result in any 
user-visible state changes.


The classic case is ^Z + bg, but iirc you can trigger it with ptrace too. 
And I think two threads racing to pick up the same signal can cause it 
too, for that matter (ie one thread takes the signal, the other one got 
interrupted but there's nothing there, so it just causes a system call 
restart).


There's basically two different system call restart mechanisms in the 
kernel:


 - returning -ERESTARTNOHAND will cause the system call to be restarted 
   with the *original* arguments if no signal handler was actually 
   invoked. This has been around for a long time, and is used by a lot of 
   system calls. It's fine for things that are idempotent, ie the argument 
   meaning doesn't change over time (things like a "read()" system call, 
   for example)


 - the "restart_block" model that returns -ERESTARTBLOCK, which will cause 
   the system call to be restarted with the arguments specified in the 
   system call restart block. This is for system calls that are *not* 
   idempotent, ie the argument might be a relative timeout or something 
   like that, where we need to actually behave *differently* when 
   restarting.


The latter case is "new" (it's been around for a while, but relative to 
the ERESTARTNOHAND one), and it relies on the system call itself setting 
up its restart point and the argument save area. And each such system call 
can obviously screw it up by saving/restoring the arguments with the 
incorrect semantics.


So this bug was really (a) specific to that particular futex restart 
mechanism, and (b) only triggers for the (rather unusual) case where the 
system call gets interrupted by a signal, but no signal handler actually 
happens. In practice, ^Z is the most common case by far (other signals are 
either ignored and don't even cause an interrupt event in the first place, 
or they are "real" signals, and cause a signal handler to be invoked).


Linus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread Linus Torvalds


On Wed, 5 Dec 2007, David Holmes - Sun Microsystems wrote:
> 
> While this was observed with process control signals, my concern was that
> other signals might cause pthread_cond_timedwait to return immediately in the
> same way. The test program allows for SIGUSR1 and SIGRTMIN testing as well,
> but these other signals did not cause the immediate return. But it would seem
> from Steven's analysis that this is just a fortuitous result. If I understand
> things correctly, any interruption of pthread_cond_timedwait by a signal,
> could result in waiting until an arbitrary time - depending on how the stack
> value was corrupted. Is that correct?

No, very few things can actually cause the restart_block path to be taken. 
An actual signal execution would turn that into an EINTR, the only case 
that should ever trigger this is a signal that causes some kernel action 
(ie the system call *is* interrupted), but does not actually result in any 
user-visible state changes.

The classic case is ^Z + bg, but iirc you can trigger it with ptrace too. 
And I think two threads racing to pick up the same signal can cause it 
too, for that matter (ie one thread takes the signal, the other one got 
interrupted but there's nothing there, so it just causes a system call 
restart).

There's basically two different system call restart mechanisms in the 
kernel:

 - returning -ERESTARTNOHAND will cause the system call to be restarted 
   with the *original* arguments if no signal handler was actually 
   invoked. This has been around for a long time, and is used by a lot of 
   system calls. It's fine for things that are idempotent, ie the argument 
   meaning doesn't change over time (things like a "read()" system call, 
   for example)

 - the "restart_block" model that returns -ERESTARTBLOCK, which will cause 
   the system call to be restarted with the arguments specified in the 
   system call restart block. This is for system calls that are *not* 
   idempotent, ie the argument might be a relative timeout or something 
   like that, where we need to actually behave *differently* when 
   restarting.

The latter case is "new" (it's been around for a while, but relative to 
the ERESTARTNOHAND one), and it relies on the system call itself setting 
up its restart point and the argument save area. And each such system call 
can obviously screw it up by saving/restoring the arguments with the 
incorrect semantics.

So this bug was really (a) specific to that particular futex restart 
mechanism, and (b) only triggers for the (rather unusual) case where the 
system call gets interrupted by a signal, but no signal handler actually 
happens. In practice, ^Z is the most common case by far (other signals are 
either ignored and don't even cause an interrupt event in the first place, 
or they are "real" signals, and cause a signal handler to be invoked).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread Thomas Gleixner
On Tue, 4 Dec 2007, Linus Torvalds wrote:
> 
> Patch looks fine to me.
> 
> On Tue, 4 Dec 2007, Steven Rostedt wrote:
> > 
> > Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
> > in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
> > Not sure what that is there for.
> 
> Hmm. I'd not expect user-mode headers to ever include 
> , and if they do, they'd already get get totally 
> invalid namespace pollution ("struct restart_block" at a minimum) along 
> with stuff that simply isn't sensible in user-space at all, so I think 
> this part is fine.
> 
> And I guess somebody will scream if it bites them ;)
> 
> Anyway, my gut feel is that this is potentially a real problem, and we 
> should fix it asap (ie it should go into 2.6.24 even at this late stage in 
> the game), but it would be nice to know if the problem actually hit any 
> actual real program, and not just a test-setup.
> 
> So here's a question for David Holmes:  What caused you to actually notice 
> this behaviour? Can this actually be seen in real life usage?
> 
> Anyway, at a minimum, here's an
> 
>   Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
> 
> and I suspect I should just apply it directly. Any comments from anybody 
> else?

Doh, yes. I completely missed that stack dependency of the pointer
when I looked at the patch back then. The solution looks solid and
probably we should get rid of the unnamed union member and fixup the
other places which use restart_block in a similar way.

Just a minor nit. Can we please use "futex" instead of "fu" ? I'm just
envisioning the next union member named "ba".

Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

Please apply with the s/fu/futex/ change. This needs to go into stable
.22/.23 as well.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread David Holmes - Sun Microsystems

Linus Torvalds said the following on  5/12/07 01:41 PM:
So here's a question for David Holmes:  What caused you to actually notice 
this behaviour? Can this actually be seen in real life usage?


We observed an application "hang" that turned out to be caused by a 
clock mismatch between that used with the pthread_cond_t and that used 
to convert a relative wait time to an absolute one. When the program ran 
in the foreground and hung I used ctrl-Z to suspend it then "bg" to 
background it. As soon as I did that the application became unstuck.


While this was observed with process control signals, my concern was 
that other signals might cause pthread_cond_timedwait to return 
immediately in the same way. The test program allows for SIGUSR1 and 
SIGRTMIN testing as well, but these other signals did not cause the 
immediate return. But it would seem from Steven's analysis that this is 
just a fortuitous result. If I understand things correctly, any 
interruption of pthread_cond_timedwait by a signal, could result in 
waiting until an arbitrary time - depending on how the stack value was 
corrupted. Is that correct?


Thanks,
David Holmes
Senior Java Technologist
Java SE VM Real-time and Embedded Group
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread Steven Rostedt

On Tue, 4 Dec 2007, Randy Dunlap wrote:
>
> Steve/David,
>
> where can I find the test case, please?

David attached it to the RH Bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=408321

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread Randy Dunlap
On Tue, 4 Dec 2007 19:41:32 -0800 (PST) Linus Torvalds wrote:

> 
> Patch looks fine to me.
> 
> On Tue, 4 Dec 2007, Steven Rostedt wrote:
> > 
> > Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
> > in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
> > Not sure what that is there for.
> 
> Hmm. I'd not expect user-mode headers to ever include 
> , and if they do, they'd already get get totally 
> invalid namespace pollution ("struct restart_block" at a minimum) along 
> with stuff that simply isn't sensible in user-space at all, so I think 
> this part is fine.
> 
> And I guess somebody will scream if it bites them ;)
> 
> Anyway, my gut feel is that this is potentially a real problem, and we 
> should fix it asap (ie it should go into 2.6.24 even at this late stage in 
> the game), but it would be nice to know if the problem actually hit any 
> actual real program, and not just a test-setup.

Steve/David,

where can I find the test case, please?

> So here's a question for David Holmes:  What caused you to actually notice 
> this behaviour? Can this actually be seen in real life usage?
> 
> Anyway, at a minimum, here's an
> 
>   Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
> 
> and I suspect I should just apply it directly. Any comments from anybody 
> else?

---
~Randy
Features and documentation: http://lwn.net/Articles/260136/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread Linus Torvalds

Patch looks fine to me.

On Tue, 4 Dec 2007, Steven Rostedt wrote:
> 
> Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
> in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
> Not sure what that is there for.

Hmm. I'd not expect user-mode headers to ever include 
, and if they do, they'd already get get totally 
invalid namespace pollution ("struct restart_block" at a minimum) along 
with stuff that simply isn't sensible in user-space at all, so I think 
this part is fine.

And I guess somebody will scream if it bites them ;)

Anyway, my gut feel is that this is potentially a real problem, and we 
should fix it asap (ie it should go into 2.6.24 even at this late stage in 
the game), but it would be nice to know if the problem actually hit any 
actual real program, and not just a test-setup.

So here's a question for David Holmes:  What caused you to actually notice 
this behaviour? Can this actually be seen in real life usage?

Anyway, at a minimum, here's an

Acked-by: Linus Torvalds <[EMAIL PROTECTED]>

and I suspect I should just apply it directly. Any comments from anybody 
else?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -v2] fix for futex_wait signal stack corruption

2007-12-04 Thread Steven Rostedt

David Holmes found a bug in the RT patch with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too.  The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.

Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.

Here's the relevant code from kernel/futex.c: (not in order in the file)

[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
  struct timespec __user *utime, u32 __user *uaddr2,
  u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;

if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;

t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}

[...]

long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;

if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;

switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);

[...]

static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
  u32 val, ktime_t *abs_time)
{
[...]
   struct restart_block *restart;
restart = ¤t_thread_info()->restart_block;
restart->fn = futex_wait_restart;
restart->arg0 = (unsigned long)uaddr;
restart->arg1 = (unsigned long)val;
restart->arg2 = (unsigned long)abs_time;
restart->arg3 = 0;
if (fshared)
restart->arg3 |= ARG3_SHARED;
return -ERESTART_RESTARTBLOCK;
[...]

static long futex_wait_restart(struct restart_block *restart)
{
u32 __user *uaddr = (u32 __user *)restart->arg0;
u32 val = (u32)restart->arg1;
ktime_t *abs_time = (ktime_t *)restart->arg2;
struct rw_semaphore *fshared = NULL;

restart->fn = do_no_restart_syscall;
if (restart->arg3 & ARG3_SHARED)
fshared = ¤t->mm->mmap_sem;
return (long)futex_wait(uaddr, fshared, val, abs_time);
}


So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK.  The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.

This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.

I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.

The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters.  This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.

Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for.  If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

 include/linux/thread_info.h |   15 ++-
 kernel/futex.c  |   25 +
 2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 1c4eb41..d97c874 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -7,12 +7,25 @@
 #ifndef _LINUX_THREAD_INFO_H
 #define _LINUX_THREAD_INFO_H

+#include 
+
 /*
  * System call restart block.
  */
 struct restart_block {
long (*fn)(struct restart_block *);
-   unsigned long arg0, arg1, arg2, arg3;
+   union {
+   struct {
+