SMP races in proc with thread_struct

2001-05-01 Thread Todd Inglett

Perhaps this is old news...but...

I can easily create a race when reading /proc//stat
(fs/proc/{base.c,array.c}) where a rapidly reading application, such as
"top", starts reading stats for a thread which goes away during the
read.  This is easily reproduced with a program that rapidly forks and
exits while top is running.

On inspection, I don't see how the code can expect the thread_struct to
stay around since it is not holding any lock for the duration of its
use.  The code could hold the thread_struct's lock (after verifying it
still exists while holding tasklist_lock I would imagine), but for
performance I would think a better solution would be to copy the struct
since stale data is probably ok in this case.

Dereferencing a non-existent thread_struct is clearly not ok.

Would anyone familiar with this code care to comment?
-- 
-todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP races in proc with thread_struct

2001-05-03 Thread Todd Inglett

Alexander Viro wrote:
> 
> On Tue, 1 May 2001, Todd Inglett wrote:
> 
> > Perhaps this is old news...but...
> >
> > I can easily create a race when reading /proc//stat
> > (fs/proc/{base.c,array.c}) where a rapidly reading application, such as
> > "top", starts reading stats for a thread which goes away during the
> > read.  This is easily reproduced with a program that rapidly forks and
> > exits while top is running.
> >
> > On inspection, I don't see how the code can expect the thread_struct to
> > stay around since it is not holding any lock for the duration of its
> > use.  The code could hold the thread_struct's lock (after verifying it
> > still exists while holding tasklist_lock I would imagine), but for
> > performance I would think a better solution would be to copy the struct
> > since stale data is probably ok in this case.
> >
> > Dereferencing a non-existent thread_struct is clearly not ok.
> >
> > Would anyone familiar with this code care to comment?
> 
> Code bumps the reference count of couple of pages that hold task_struct
> when it opens the file.

Yes that's right.  [Note that I erroneously wrote thread_struct when I
meant task_struct].

On closer inspection I see it is the *parent* task_struct that is the
problem here.  The task has indeed exited and the memory for the
task_struct is correctly held.  However, the /proc code will grab
tasklist_lock and dereference the parent task_struct to get the parent
pid.  Grabbing tasklist_lock is good, of course, when the task is live
because the parent could be switched at any time.  But in this case the
child is already done.  And so is the parent.  So the child's parent
task_struct is gone, but the stale held task_struct still points to
where it once was.

Does this sound like a possible scenerio?

-- 
-todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP races in proc with thread_struct

2001-05-04 Thread Todd Inglett

Ok, I've got this isolated.  Here's the sequence of events:

1.  Some process T (probably "top") opens /proc/N/stat.
2.  While holding tasklist_lock the proc code does a get_task_struct()
to add a ref count to the page.
3.  Process N exits.
4.  The parent of process N exits.
5.  Process T reads from the open file.  This calls proc_pid_stat()
which dereferences N's task_struct.  This is ok as Alexander points out
because a reference is held.
6.  Using N's task_struct process T attempt to dereference the *parent*
task struct.  It assumes this is ok because:
A) it is holding tasklist_lock so N cannot be reparented in a race.
B) every process *always* has a valid parent.

But this is where hell breaks loose.  Every process has a valid parent
-- unless it is dead and nobody cares.  Process N has already exited and
released from the tasklist while its parent was still alive.  There was
no reason to reparent it.  It just got released.  So N's task_struct has
a dangling ptr to its parent.  Nobody is holding the parent task_struct,
either.  When the parent died memory for its task_struct was released. 
This is ungood.


My opinion here is that this is proc's problem.  When we free a
task_struct it could be "cleaned up" of dangling ptrs, but this is a
hack to cover a bug in proc.

This is not isolated to the parent task_struct, either.  The task_struct
mm is also dereferenced.  It is pretty easy to validate a parent
task_struct ptr (just hold tasklist_lock and run the list to check if it
is still valid -- might not be the *right* task, but it will still be
valid).  However, how do you validate the mm is ok?

It would be nice if there were an easy way to detect when the process
gets into this state.  Certainly it is in this state if the page
reference count is 1.  But multiple processes *could* be reading the
same proc file holding additional artificial ref counts.

Hmm...perhaps if I scan the tasklist for my own task?  If I am not in
the list I either return an error or faked info.  I'll try this

-- 
-todd

BTW, by adding code to validate the parent my test has now run for 18
hours on a 4-way without failure.  It otherwise would fail within the
first 5 minutes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP races in proc with thread_struct

2001-05-05 Thread Todd Inglett

Andreas Ferber wrote:
> 
> On Fri, May 04, 2001 at 10:46:43PM +1000, Keith Owens wrote:
> 
> > For a read only case, the only important
> > thing is not to die, one occurrence of bad data is tolerable.
> 
> Strong NACK. The pages where the bad data comes from may in some cases
> already be reclaimed for other data, probably something security
> relevant, which should never ever be given even read access by an
> unauthorized user. Even if this event may be a very rare case, one
> single occurrence of this is one to much.

Agreed.  Worse, it is not readonly.  The /proc code task_lock's the task
struct, thus writing to it.

I'll post a patch shortly once I've tested it.  Worse case only if the
task is exiting I sweep the tasklist looking for the parent to see if
the parent is still valid.  I am not verifying if it is the actual
parent (it might be a new task allocated at the same spot).  I could
just report 0 (or 1) for the parent for any process that is exiting, but
then you won't be able to see the ppid for zombies.  Or is there another
state I can look for?  What I really need is PF_EXITED :).

I am a little concerned also about mm, file, tty and sig fields. These
appear to be NULLed in do_exit(), but I haven't tracked down tty and sig
yet.
-- 
-todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/