Re: PATCH [PPC64]: dead processes never reaped

2005-04-19 Thread Benjamin Herrenschmidt
On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
> 
> Hi,
> 
> The patch below appears to fix a problem where a number of dead processes
> linger on the system.  On a highly loaded system, dozens of processes 
> were found stuck in do_exit(), calling thier very last schedule(), and
> then being lost forever.  

Ok, we spent some time with Paul decrypting what _switch_to is supposed
to do. Our understanding at this point is that the current code is
correct on both ppc32 and ppc64, that is:

The "prev" passed in is always "current" and we don't see how it can be
anything else. We use a local variable instead of current in the common
code because accessing current can be slow on some architectures. I
don't see any codepath where prev != current before switch_to.

If we didn't do some black magic that I explain below, _switch_to would
switch the entire context, including stack, and thus including the value
of "prev". Which means that we would always come back with prev beeing
current, which is useless for reaping the old task. What we want is that
this "prev" that was passed to _switch_to() is returned so that we can
rip that previous task despite the change of context, that is basically
prev has to be an invariant vs. the change of context in switch_to.

On ppc & ppc64, we implement that by passing that prev (or it's thread
counterpart) to the assembly context switch code in r3. This code will
preserve it and return it as-is (or re-transformed from thread to task).

So your problem must be somewhere else. I've looked at the need_resched
code path and we always reload prev = current from a non-preemptible
region, so it can't be wrong.

This was verified on 2.6.12-rc2, there might be something else wrong in
an older kernel.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-19 Thread Linas Vepstas
On Tue, Apr 19, 2005 at 11:01:25AM +1000, Benjamin Herrenschmidt was heard to 
remark:
> On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
> > 
> > Hi,
> > 
> > The patch below appears to fix a problem where a number of dead processes
> > linger on the system.  On a highly loaded system, dozens of processes 
> > were found stuck in do_exit(), calling thier very last schedule(), and
> > then being lost forever.  
> > 
> > Processes that are PF_DEAD are cleaned up *after* the context switch, 
> > in a routine called finish_task_switch(task_t *prev). The "prev" gets 
> > the  value returned by _switch() in entry.S, but this value comes from 
> >   
> > __switch_to (struct task_struct *prev, 
> > struct task_struct *new) 
> > { 
> >old_thread = >thread; ///XXX shouldn't this be prev, not 
> > current? 
> >last = _switch(old_thread, new_thread); 
> >return last; 
> > } 
> >  
> > The way I see it, "prev" and "current" are almost always going to be  
> > pointing at the same thing; however, if a "need resched" happens,  
> > or there's a pre-emept or some-such, then prev and current won't be  
> > the same; in which case, finish_task_switch() will end up cleaning  
> > up the old current, instead of prev.  This will result in dead processes 
> > hanging around, which will never be scheduled again, and will never  
> > get a chance to have put_task_struct() called on them.  
> 
> I wonder why we bother doing all that at all... we could just return
> "prev" from __switch_to() no ? Like x86 does...

Probably.  I assume this funny two-step is left-over from a 2.4 kernel
design point.  Naively, we could rturn "prev", this would save a few
cycles. Cut the "addi  r3,r3,-THREAD" from entry.S as well.  I was being 
conservative with the patch, making the smallest change possible.  
Do you want this larger patch?


--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-19 Thread Linas Vepstas
On Tue, Apr 19, 2005 at 11:07:01AM +1000, Benjamin Herrenschmidt was heard to 
remark:
> On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
> > 
> > Hi,
> > 
> > The patch below appears to fix a problem where a number of dead processes
> > linger on the system.  On a highly loaded system, dozens of processes 
> > were found stuck in do_exit(), calling thier very last schedule(), and
> > then being lost forever.  
> > 
> > Processes that are PF_DEAD are cleaned up *after* the context switch, 
> > in a routine called finish_task_switch(task_t *prev). The "prev" gets 
> > the  value returned by _switch() in entry.S, but this value comes from 
> >   
> > __switch_to (struct task_struct *prev, 
> > struct task_struct *new) 
> > { 
> >old_thread = >thread; ///XXX shouldn't this be prev, not 
> > current? 
> >last = _switch(old_thread, new_thread); 
> >return last; 
> > } 
> >  
> > The way I see it, "prev" and "current" are almost always going to be  
> > pointing at the same thing; however, if a "need resched" happens,  
> > or there's a pre-emept or some-such, then prev and current won't be  
> > the same; in which case, finish_task_switch() will end up cleaning  
> > up the old current, instead of prev.  This will result in dead processes 
> > hanging around, which will never be scheduled again, and will never  
> > get a chance to have put_task_struct() called on them.  
> 
> Ok, thinking moer about this ... that will need maybe some help from
> Ingo so I fully understand where schedule's are allowed ... We are
> basically in the middle of the scheduler here, so I wonder how much of
> the scheduler itself can be preempted or so ...
> 
> Basically, under which circumstances can prev and current be different ?

I remember finding a path through void __sched schedule(void) that
took a branch through the goto need_resched; that would result in this.
I takes a bit of mental gymnastics to see how this might happen.

FWIW, I can send you a debug session showing all cpu's idle and 44 dead 
processes sitting in do_exit().  All but two of these were Java threads,
so this seems to be some sort of thread-scheduling subtlty.  (the two
that weren't java threads were a find|grep pair that must have gotten 
tangled in.)

Given that the patch seems to fix the problem, I didn't dig much deeper.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-19 Thread Linas Vepstas
On Tue, Apr 19, 2005 at 11:07:01AM +1000, Benjamin Herrenschmidt was heard to 
remark:
 On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
  
  Hi,
  
  The patch below appears to fix a problem where a number of dead processes
  linger on the system.  On a highly loaded system, dozens of processes 
  were found stuck in do_exit(), calling thier very last schedule(), and
  then being lost forever.  
  
  Processes that are PF_DEAD are cleaned up *after* the context switch, 
  in a routine called finish_task_switch(task_t *prev). The prev gets 
  the  value returned by _switch() in entry.S, but this value comes from 

  __switch_to (struct task_struct *prev, 
  struct task_struct *new) 
  { 
 old_thread = current-thread; ///XXX shouldn't this be prev, not 
  current? 
 last = _switch(old_thread, new_thread); 
 return last; 
  } 
   
  The way I see it, prev and current are almost always going to be  
  pointing at the same thing; however, if a need resched happens,  
  or there's a pre-emept or some-such, then prev and current won't be  
  the same; in which case, finish_task_switch() will end up cleaning  
  up the old current, instead of prev.  This will result in dead processes 
  hanging around, which will never be scheduled again, and will never  
  get a chance to have put_task_struct() called on them.  
 
 Ok, thinking moer about this ... that will need maybe some help from
 Ingo so I fully understand where schedule's are allowed ... We are
 basically in the middle of the scheduler here, so I wonder how much of
 the scheduler itself can be preempted or so ...
 
 Basically, under which circumstances can prev and current be different ?

I remember finding a path through void __sched schedule(void) that
took a branch through the goto need_resched; that would result in this.
I takes a bit of mental gymnastics to see how this might happen.

FWIW, I can send you a debug session showing all cpu's idle and 44 dead 
processes sitting in do_exit().  All but two of these were Java threads,
so this seems to be some sort of thread-scheduling subtlty.  (the two
that weren't java threads were a find|grep pair that must have gotten 
tangled in.)

Given that the patch seems to fix the problem, I didn't dig much deeper.

--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-19 Thread Linas Vepstas
On Tue, Apr 19, 2005 at 11:01:25AM +1000, Benjamin Herrenschmidt was heard to 
remark:
 On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
  
  Hi,
  
  The patch below appears to fix a problem where a number of dead processes
  linger on the system.  On a highly loaded system, dozens of processes 
  were found stuck in do_exit(), calling thier very last schedule(), and
  then being lost forever.  
  
  Processes that are PF_DEAD are cleaned up *after* the context switch, 
  in a routine called finish_task_switch(task_t *prev). The prev gets 
  the  value returned by _switch() in entry.S, but this value comes from 

  __switch_to (struct task_struct *prev, 
  struct task_struct *new) 
  { 
 old_thread = current-thread; ///XXX shouldn't this be prev, not 
  current? 
 last = _switch(old_thread, new_thread); 
 return last; 
  } 
   
  The way I see it, prev and current are almost always going to be  
  pointing at the same thing; however, if a need resched happens,  
  or there's a pre-emept or some-such, then prev and current won't be  
  the same; in which case, finish_task_switch() will end up cleaning  
  up the old current, instead of prev.  This will result in dead processes 
  hanging around, which will never be scheduled again, and will never  
  get a chance to have put_task_struct() called on them.  
 
 I wonder why we bother doing all that at all... we could just return
 prev from __switch_to() no ? Like x86 does...

Probably.  I assume this funny two-step is left-over from a 2.4 kernel
design point.  Naively, we could rturn prev, this would save a few
cycles. Cut the addi  r3,r3,-THREAD from entry.S as well.  I was being 
conservative with the patch, making the smallest change possible.  
Do you want this larger patch?


--linas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-19 Thread Benjamin Herrenschmidt
On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
 
 Hi,
 
 The patch below appears to fix a problem where a number of dead processes
 linger on the system.  On a highly loaded system, dozens of processes 
 were found stuck in do_exit(), calling thier very last schedule(), and
 then being lost forever.  

Ok, we spent some time with Paul decrypting what _switch_to is supposed
to do. Our understanding at this point is that the current code is
correct on both ppc32 and ppc64, that is:

The prev passed in is always current and we don't see how it can be
anything else. We use a local variable instead of current in the common
code because accessing current can be slow on some architectures. I
don't see any codepath where prev != current before switch_to.

If we didn't do some black magic that I explain below, _switch_to would
switch the entire context, including stack, and thus including the value
of prev. Which means that we would always come back with prev beeing
current, which is useless for reaping the old task. What we want is that
this prev that was passed to _switch_to() is returned so that we can
rip that previous task despite the change of context, that is basically
prev has to be an invariant vs. the change of context in switch_to.

On ppc  ppc64, we implement that by passing that prev (or it's thread
counterpart) to the assembly context switch code in r3. This code will
preserve it and return it as-is (or re-transformed from thread to task).

So your problem must be somewhere else. I've looked at the need_resched
code path and we always reload prev = current from a non-preemptible
region, so it can't be wrong.

This was verified on 2.6.12-rc2, there might be something else wrong in
an older kernel.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Nick Piggin
On Tue, 2005-04-19 at 11:07 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
> > 
> > Hi,
> > 
> > The patch below appears to fix a problem where a number of dead processes
> > linger on the system.  On a highly loaded system, dozens of processes 
> > were found stuck in do_exit(), calling thier very last schedule(), and
> > then being lost forever.  
> > 
> > Processes that are PF_DEAD are cleaned up *after* the context switch, 
> > in a routine called finish_task_switch(task_t *prev). The "prev" gets 
> > the  value returned by _switch() in entry.S, but this value comes from 
> >   
> > __switch_to (struct task_struct *prev, 
> > struct task_struct *new) 
> > { 
> >old_thread = >thread; ///XXX shouldn't this be prev, not 
> > current? 
> >last = _switch(old_thread, new_thread); 
> >return last; 
> > } 
> >  
> > The way I see it, "prev" and "current" are almost always going to be  
> > pointing at the same thing; however, if a "need resched" happens,  
> > or there's a pre-emept or some-such, then prev and current won't be  
> > the same; in which case, finish_task_switch() will end up cleaning  
> > up the old current, instead of prev.  This will result in dead processes 
> > hanging around, which will never be scheduled again, and will never  
> > get a chance to have put_task_struct() called on them.  
> 
> Ok, thinking moer about this ... that will need maybe some help from
> Ingo so I fully understand where schedule's are allowed ... We are
> basically in the middle of the scheduler here, so I wonder how much of
> the scheduler itself can be preempted or so ...
> 

Not much. schedule() has a small preempt window at the beginning
and end of the function.

The context switch is of course run with preempt disabled. Ie.
your switch_to should never get preempted.

> Basically, under which circumstances can prev and current be different ?
> 

Depends on your context switch, really. prev == current before you
switch, and when you switch to 'new' it is different. However, I think
the 'new' current has *its* old prev on the stack (which == new
current). You just have to preserve the old 'prev' somehow (ie. the
thread you switched away from).

-- 
SUSE Labs, Novell Inc.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Benjamin Herrenschmidt
On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
> 
> Hi,
> 
> The patch below appears to fix a problem where a number of dead processes
> linger on the system.  On a highly loaded system, dozens of processes 
> were found stuck in do_exit(), calling thier very last schedule(), and
> then being lost forever.  
> 
> Processes that are PF_DEAD are cleaned up *after* the context switch, 
> in a routine called finish_task_switch(task_t *prev). The "prev" gets 
> the  value returned by _switch() in entry.S, but this value comes from 
>   
> __switch_to (struct task_struct *prev, 
> struct task_struct *new) 
> { 
>old_thread = >thread; ///XXX shouldn't this be prev, not current? 
>last = _switch(old_thread, new_thread); 
>return last; 
> } 
>  
> The way I see it, "prev" and "current" are almost always going to be  
> pointing at the same thing; however, if a "need resched" happens,  
> or there's a pre-emept or some-such, then prev and current won't be  
> the same; in which case, finish_task_switch() will end up cleaning  
> up the old current, instead of prev.  This will result in dead processes 
> hanging around, which will never be scheduled again, and will never  
> get a chance to have put_task_struct() called on them.  

Ok, thinking moer about this ... that will need maybe some help from
Ingo so I fully understand where schedule's are allowed ... We are
basically in the middle of the scheduler here, so I wonder how much of
the scheduler itself can be preempted or so ...

Basically, under which circumstances can prev and current be different ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Benjamin Herrenschmidt
On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
> 
> Hi,
> 
> The patch below appears to fix a problem where a number of dead processes
> linger on the system.  On a highly loaded system, dozens of processes 
> were found stuck in do_exit(), calling thier very last schedule(), and
> then being lost forever.  
> 
> Processes that are PF_DEAD are cleaned up *after* the context switch, 
> in a routine called finish_task_switch(task_t *prev). The "prev" gets 
> the  value returned by _switch() in entry.S, but this value comes from 
>   
> __switch_to (struct task_struct *prev, 
> struct task_struct *new) 
> { 
>old_thread = >thread; ///XXX shouldn't this be prev, not current? 
>last = _switch(old_thread, new_thread); 
>return last; 
> } 
>  
> The way I see it, "prev" and "current" are almost always going to be  
> pointing at the same thing; however, if a "need resched" happens,  
> or there's a pre-emept or some-such, then prev and current won't be  
> the same; in which case, finish_task_switch() will end up cleaning  
> up the old current, instead of prev.  This will result in dead processes 
> hanging around, which will never be scheduled again, and will never  
> get a chance to have put_task_struct() called on them.  

I wonder why we bother doing all that at all... we could just return
"prev" from __switch_to() no ? Like x86 does...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Linas Vepstas


Hi,

The patch below appears to fix a problem where a number of dead processes
linger on the system.  On a highly loaded system, dozens of processes 
were found stuck in do_exit(), calling thier very last schedule(), and
then being lost forever.  

Processes that are PF_DEAD are cleaned up *after* the context switch, 
in a routine called finish_task_switch(task_t *prev). The "prev" gets 
the  value returned by _switch() in entry.S, but this value comes from 
  
__switch_to (struct task_struct *prev, 
struct task_struct *new) 
{ 
   old_thread = >thread; ///XXX shouldn't this be prev, not current? 
   last = _switch(old_thread, new_thread); 
   return last; 
} 
 
The way I see it, "prev" and "current" are almost always going to be  
pointing at the same thing; however, if a "need resched" happens,  
or there's a pre-emept or some-such, then prev and current won't be  
the same; in which case, finish_task_switch() will end up cleaning  
up the old current, instead of prev.  This will result in dead processes 
hanging around, which will never be scheduled again, and will never  
get a chance to have put_task_struct() called on them.  

This patch fixes this.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


--- arch/ppc64/kernel/process.c.orig2005-04-18 14:26:42.0 -0500
+++ arch/ppc64/kernel/process.c 2005-04-18 14:27:54.0 -0500
@@ -204,7 +204,7 @@ struct task_struct *__switch_to(struct t
flush_tlb_pending();
 
new_thread = >thread;
-   old_thread = >thread;
+   old_thread = >thread;
 
local_irq_save(flags);
last = _switch(old_thread, new_thread);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Linas Vepstas


Hi,

The patch below appears to fix a problem where a number of dead processes
linger on the system.  On a highly loaded system, dozens of processes 
were found stuck in do_exit(), calling thier very last schedule(), and
then being lost forever.  

Processes that are PF_DEAD are cleaned up *after* the context switch, 
in a routine called finish_task_switch(task_t *prev). The prev gets 
the  value returned by _switch() in entry.S, but this value comes from 
  
__switch_to (struct task_struct *prev, 
struct task_struct *new) 
{ 
   old_thread = current-thread; ///XXX shouldn't this be prev, not current? 
   last = _switch(old_thread, new_thread); 
   return last; 
} 
 
The way I see it, prev and current are almost always going to be  
pointing at the same thing; however, if a need resched happens,  
or there's a pre-emept or some-such, then prev and current won't be  
the same; in which case, finish_task_switch() will end up cleaning  
up the old current, instead of prev.  This will result in dead processes 
hanging around, which will never be scheduled again, and will never  
get a chance to have put_task_struct() called on them.  

This patch fixes this.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


--- arch/ppc64/kernel/process.c.orig2005-04-18 14:26:42.0 -0500
+++ arch/ppc64/kernel/process.c 2005-04-18 14:27:54.0 -0500
@@ -204,7 +204,7 @@ struct task_struct *__switch_to(struct t
flush_tlb_pending();
 
new_thread = new-thread;
-   old_thread = current-thread;
+   old_thread = prev-thread;
 
local_irq_save(flags);
last = _switch(old_thread, new_thread);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Benjamin Herrenschmidt
On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
 
 Hi,
 
 The patch below appears to fix a problem where a number of dead processes
 linger on the system.  On a highly loaded system, dozens of processes 
 were found stuck in do_exit(), calling thier very last schedule(), and
 then being lost forever.  
 
 Processes that are PF_DEAD are cleaned up *after* the context switch, 
 in a routine called finish_task_switch(task_t *prev). The prev gets 
 the  value returned by _switch() in entry.S, but this value comes from 
   
 __switch_to (struct task_struct *prev, 
 struct task_struct *new) 
 { 
old_thread = current-thread; ///XXX shouldn't this be prev, not current? 
last = _switch(old_thread, new_thread); 
return last; 
 } 
  
 The way I see it, prev and current are almost always going to be  
 pointing at the same thing; however, if a need resched happens,  
 or there's a pre-emept or some-such, then prev and current won't be  
 the same; in which case, finish_task_switch() will end up cleaning  
 up the old current, instead of prev.  This will result in dead processes 
 hanging around, which will never be scheduled again, and will never  
 get a chance to have put_task_struct() called on them.  

I wonder why we bother doing all that at all... we could just return
prev from __switch_to() no ? Like x86 does...

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Benjamin Herrenschmidt
On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
 
 Hi,
 
 The patch below appears to fix a problem where a number of dead processes
 linger on the system.  On a highly loaded system, dozens of processes 
 were found stuck in do_exit(), calling thier very last schedule(), and
 then being lost forever.  
 
 Processes that are PF_DEAD are cleaned up *after* the context switch, 
 in a routine called finish_task_switch(task_t *prev). The prev gets 
 the  value returned by _switch() in entry.S, but this value comes from 
   
 __switch_to (struct task_struct *prev, 
 struct task_struct *new) 
 { 
old_thread = current-thread; ///XXX shouldn't this be prev, not current? 
last = _switch(old_thread, new_thread); 
return last; 
 } 
  
 The way I see it, prev and current are almost always going to be  
 pointing at the same thing; however, if a need resched happens,  
 or there's a pre-emept or some-such, then prev and current won't be  
 the same; in which case, finish_task_switch() will end up cleaning  
 up the old current, instead of prev.  This will result in dead processes 
 hanging around, which will never be scheduled again, and will never  
 get a chance to have put_task_struct() called on them.  

Ok, thinking moer about this ... that will need maybe some help from
Ingo so I fully understand where schedule's are allowed ... We are
basically in the middle of the scheduler here, so I wonder how much of
the scheduler itself can be preempted or so ...

Basically, under which circumstances can prev and current be different ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Nick Piggin
On Tue, 2005-04-19 at 11:07 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
  
  Hi,
  
  The patch below appears to fix a problem where a number of dead processes
  linger on the system.  On a highly loaded system, dozens of processes 
  were found stuck in do_exit(), calling thier very last schedule(), and
  then being lost forever.  
  
  Processes that are PF_DEAD are cleaned up *after* the context switch, 
  in a routine called finish_task_switch(task_t *prev). The prev gets 
  the  value returned by _switch() in entry.S, but this value comes from 

  __switch_to (struct task_struct *prev, 
  struct task_struct *new) 
  { 
 old_thread = current-thread; ///XXX shouldn't this be prev, not 
  current? 
 last = _switch(old_thread, new_thread); 
 return last; 
  } 
   
  The way I see it, prev and current are almost always going to be  
  pointing at the same thing; however, if a need resched happens,  
  or there's a pre-emept or some-such, then prev and current won't be  
  the same; in which case, finish_task_switch() will end up cleaning  
  up the old current, instead of prev.  This will result in dead processes 
  hanging around, which will never be scheduled again, and will never  
  get a chance to have put_task_struct() called on them.  
 
 Ok, thinking moer about this ... that will need maybe some help from
 Ingo so I fully understand where schedule's are allowed ... We are
 basically in the middle of the scheduler here, so I wonder how much of
 the scheduler itself can be preempted or so ...
 

Not much. schedule() has a small preempt window at the beginning
and end of the function.

The context switch is of course run with preempt disabled. Ie.
your switch_to should never get preempted.

 Basically, under which circumstances can prev and current be different ?
 

Depends on your context switch, really. prev == current before you
switch, and when you switch to 'new' it is different. However, I think
the 'new' current has *its* old prev on the stack (which == new
current). You just have to preserve the old 'prev' somehow (ie. the
thread you switched away from).

-- 
SUSE Labs, Novell Inc.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/