SMP races in proc with thread_struct
Perhaps this is old news...but... I can easily create a race when reading /proc//stat (fs/proc/{base.c,array.c}) where a rapidly reading application, such as "top", starts reading stats for a thread which goes away during the read. This is easily reproduced with a program that rapidly forks and exits while top is running. On inspection, I don't see how the code can expect the thread_struct to stay around since it is not holding any lock for the duration of its use. The code could hold the thread_struct's lock (after verifying it still exists while holding tasklist_lock I would imagine), but for performance I would think a better solution would be to copy the struct since stale data is probably ok in this case. Dereferencing a non-existent thread_struct is clearly not ok. Would anyone familiar with this code care to comment? -- -todd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP races in proc with thread_struct
Alexander Viro wrote: > > On Tue, 1 May 2001, Todd Inglett wrote: > > > Perhaps this is old news...but... > > > > I can easily create a race when reading /proc//stat > > (fs/proc/{base.c,array.c}) where a rapidly reading application, such as > > "top", starts reading stats for a thread which goes away during the > > read. This is easily reproduced with a program that rapidly forks and > > exits while top is running. > > > > On inspection, I don't see how the code can expect the thread_struct to > > stay around since it is not holding any lock for the duration of its > > use. The code could hold the thread_struct's lock (after verifying it > > still exists while holding tasklist_lock I would imagine), but for > > performance I would think a better solution would be to copy the struct > > since stale data is probably ok in this case. > > > > Dereferencing a non-existent thread_struct is clearly not ok. > > > > Would anyone familiar with this code care to comment? > > Code bumps the reference count of couple of pages that hold task_struct > when it opens the file. Yes that's right. [Note that I erroneously wrote thread_struct when I meant task_struct]. On closer inspection I see it is the *parent* task_struct that is the problem here. The task has indeed exited and the memory for the task_struct is correctly held. However, the /proc code will grab tasklist_lock and dereference the parent task_struct to get the parent pid. Grabbing tasklist_lock is good, of course, when the task is live because the parent could be switched at any time. But in this case the child is already done. And so is the parent. So the child's parent task_struct is gone, but the stale held task_struct still points to where it once was. Does this sound like a possible scenerio? -- -todd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP races in proc with thread_struct
Ok, I've got this isolated. Here's the sequence of events: 1. Some process T (probably "top") opens /proc/N/stat. 2. While holding tasklist_lock the proc code does a get_task_struct() to add a ref count to the page. 3. Process N exits. 4. The parent of process N exits. 5. Process T reads from the open file. This calls proc_pid_stat() which dereferences N's task_struct. This is ok as Alexander points out because a reference is held. 6. Using N's task_struct process T attempt to dereference the *parent* task struct. It assumes this is ok because: A) it is holding tasklist_lock so N cannot be reparented in a race. B) every process *always* has a valid parent. But this is where hell breaks loose. Every process has a valid parent -- unless it is dead and nobody cares. Process N has already exited and released from the tasklist while its parent was still alive. There was no reason to reparent it. It just got released. So N's task_struct has a dangling ptr to its parent. Nobody is holding the parent task_struct, either. When the parent died memory for its task_struct was released. This is ungood. My opinion here is that this is proc's problem. When we free a task_struct it could be "cleaned up" of dangling ptrs, but this is a hack to cover a bug in proc. This is not isolated to the parent task_struct, either. The task_struct mm is also dereferenced. It is pretty easy to validate a parent task_struct ptr (just hold tasklist_lock and run the list to check if it is still valid -- might not be the *right* task, but it will still be valid). However, how do you validate the mm is ok? It would be nice if there were an easy way to detect when the process gets into this state. Certainly it is in this state if the page reference count is 1. But multiple processes *could* be reading the same proc file holding additional artificial ref counts. Hmm...perhaps if I scan the tasklist for my own task? If I am not in the list I either return an error or faked info. I'll try this -- -todd BTW, by adding code to validate the parent my test has now run for 18 hours on a 4-way without failure. It otherwise would fail within the first 5 minutes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP races in proc with thread_struct
Andreas Ferber wrote: > > On Fri, May 04, 2001 at 10:46:43PM +1000, Keith Owens wrote: > > > For a read only case, the only important > > thing is not to die, one occurrence of bad data is tolerable. > > Strong NACK. The pages where the bad data comes from may in some cases > already be reclaimed for other data, probably something security > relevant, which should never ever be given even read access by an > unauthorized user. Even if this event may be a very rare case, one > single occurrence of this is one to much. Agreed. Worse, it is not readonly. The /proc code task_lock's the task struct, thus writing to it. I'll post a patch shortly once I've tested it. Worse case only if the task is exiting I sweep the tasklist looking for the parent to see if the parent is still valid. I am not verifying if it is the actual parent (it might be a new task allocated at the same spot). I could just report 0 (or 1) for the parent for any process that is exiting, but then you won't be able to see the ppid for zombies. Or is there another state I can look for? What I really need is PF_EXITED :). I am a little concerned also about mm, file, tty and sig fields. These appear to be NULLed in do_exit(), but I haven't tracked down tty and sig yet. -- -todd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/