Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2

2007-11-07 Thread Borislav Petkov
On Wed, Nov 07, 2007 at 02:20:03PM -0500, Steven Rostedt wrote:
> > 
> > Introduce a macro for suppressing gcc from generating a warning about a
> > probable uninitialized state of a variable.
> > 
> > Example:
> > 
> > -   spinlock_t *ptl;
> > +   spinlock_t *uninitialized_var(ptl);
> > 
> > Not a happy solution, but those warnings are obnoxious.
> > 
> > - Using the usual pointlessly-set-it-to-zero approach wastes several
> >   bytes of text.
> > 
> > - Using a macro means we can (hopefully) do something else if gcc changes
> >   cause the `x = x' hack to stop working
> > 
> > - Using a macro means that people who are worried about hiding true bugs
> >   can easily turn it off.
> > 
> > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> 
> I just stumbled across this being in the kernel. Well, I'm finally glad
> it made it in, even though it was suggested one year earlier ;-)
> 
>   http://lkml.org/lkml/2006/5/11/50

yeah, this was Andrew's idea. The version in the kernel, in
contrast to yours, doesn't have a config option so you still
have to make really sure you're not aiding any bugs with it.

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2

2007-11-07 Thread Steven Rostedt
> 
> Introduce a macro for suppressing gcc from generating a warning about a
> probable uninitialized state of a variable.
> 
> Example:
> 
> - spinlock_t *ptl;
> + spinlock_t *uninitialized_var(ptl);
> 
> Not a happy solution, but those warnings are obnoxious.
> 
> - Using the usual pointlessly-set-it-to-zero approach wastes several
>   bytes of text.
> 
> - Using a macro means we can (hopefully) do something else if gcc changes
>   cause the `x = x' hack to stop working
> 
> - Using a macro means that people who are worried about hiding true bugs
>   can easily turn it off.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

I just stumbled across this being in the kernel. Well, I'm finally glad
it made it in, even though it was suggested one year earlier ;-)

  http://lkml.org/lkml/2006/5/11/50

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2

2007-11-07 Thread Steven Rostedt
 
 Introduce a macro for suppressing gcc from generating a warning about a
 probable uninitialized state of a variable.
 
 Example:
 
 - spinlock_t *ptl;
 + spinlock_t *uninitialized_var(ptl);
 
 Not a happy solution, but those warnings are obnoxious.
 
 - Using the usual pointlessly-set-it-to-zero approach wastes several
   bytes of text.
 
 - Using a macro means we can (hopefully) do something else if gcc changes
   cause the `x = x' hack to stop working
 
 - Using a macro means that people who are worried about hiding true bugs
   can easily turn it off.
 
 Signed-off-by: Borislav Petkov [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]

I just stumbled across this being in the kernel. Well, I'm finally glad
it made it in, even though it was suggested one year earlier ;-)

  http://lkml.org/lkml/2006/5/11/50

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2

2007-11-07 Thread Borislav Petkov
On Wed, Nov 07, 2007 at 02:20:03PM -0500, Steven Rostedt wrote:
  
  Introduce a macro for suppressing gcc from generating a warning about a
  probable uninitialized state of a variable.
  
  Example:
  
  -   spinlock_t *ptl;
  +   spinlock_t *uninitialized_var(ptl);
  
  Not a happy solution, but those warnings are obnoxious.
  
  - Using the usual pointlessly-set-it-to-zero approach wastes several
bytes of text.
  
  - Using a macro means we can (hopefully) do something else if gcc changes
cause the `x = x' hack to stop working
  
  - Using a macro means that people who are worried about hiding true bugs
can easily turn it off.
  
  Signed-off-by: Borislav Petkov [EMAIL PROTECTED]
  Signed-off-by: Andrew Morton [EMAIL PROTECTED]
 
 I just stumbled across this being in the kernel. Well, I'm finally glad
 it made it in, even though it was suggested one year earlier ;-)
 
   http://lkml.org/lkml/2006/5/11/50

yeah, this was Andrew's idea. The version in the kernel, in
contrast to yours, doesn't have a config option so you still
have to make really sure you're not aiding any bugs with it.

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken

2007-05-17 Thread Bernhard Walle
* Vivek Goyal <[EMAIL PROTECTED]> [2007-05-17 15:05]:
> On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote:
> > * Vivek Goyal <[EMAIL PROTECTED]> [2007-05-08 19:18]:
> > > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> > > > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]:
> > > > > 
> > > > > handle_edge_irq() already makes sure that desc->action is not null, 
> > > > > still
> > > > > note_interrupt() is receiving desc->action as null, that's strange. 
> > > > > On my 
> > > > > system this is happening for irq 4 and /proc/interrupt shows that it 
> > > > > is
> > > > > coming from "serial".
> > > > 
> > > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> > > > to take a look at this at your site? For the meanwhile, should I
> > > > create a patch that checks for desc->action in note_interrupt(), too?
> > > 
> > > I can reproduce this problem only on one machine. I think there is some
> > > race condition and your code somehow just exposes it.
> > 
> > thanks for finding that out. Could you try/review out the patch below?
> > As the lock is only aquired when irqfixup == 2 it shouldn't impact
> > performance of a 'normal' system.
> 
> It does fix up my problem. I have modified your patch a bit. I think
> new version is little more clear. What do you think?

Aggreed. Thanks for spotting that problem out!


Bernhard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken

2007-05-17 Thread Vivek Goyal
On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote:
> * Vivek Goyal <[EMAIL PROTECTED]> [2007-05-08 19:18]:
> > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> > > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]:
> > > > 
> > > > handle_edge_irq() already makes sure that desc->action is not null, 
> > > > still
> > > > note_interrupt() is receiving desc->action as null, that's strange. On 
> > > > my 
> > > > system this is happening for irq 4 and /proc/interrupt shows that it is
> > > > coming from "serial".
> > > 
> > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> > > to take a look at this at your site? For the meanwhile, should I
> > > create a patch that checks for desc->action in note_interrupt(), too?
> > 
> > I can reproduce this problem only on one machine. I think there is some
> > race condition and your code somehow just exposes it.
> 
> thanks for finding that out. Could you try/review out the patch below?
> As the lock is only aquired when irqfixup == 2 it shouldn't impact
> performance of a 'normal' system.
> 

Hi Bernhard,

It does fix up my problem. I have modified your patch a bit. I think
new version is little more clear. What do you think?

Thanks
Vivek




o System crashes if booted with irqpoll command line option.

o Problem happens because Inside note_interrupt() we are accessing
  desc->action->flag without taking the desc->lock. While accessing it
  somebody goes ahead and unregisters the irq handler hence desc->action
  is NULL. By the time note_interrupt() checks it, it crashes.

o In my system it is irq 4 seriving to serial driver.

o Take the desc->lock before accessing desc->action->flag.

Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 linux-2.6.21-git12-root/kernel/irq/spurious.c |   23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff -puN kernel/irq/spurious.c~fix-irqpoll-crash kernel/irq/spurious.c
--- linux-2.6.21-git12/kernel/irq/spurious.c~fix-irqpoll-crash  2007-05-17 
17:36:50.0 +0530
+++ linux-2.6.21-git12-root/kernel/irq/spurious.c   2007-05-17 
17:53:52.0 +0530
@@ -138,6 +138,8 @@ report_bad_irq(unsigned int irq, struct 
 void note_interrupt(unsigned int irq, struct irq_desc *desc,
irqreturn_t action_ret)
 {
+   int call_misrouted_irq = 0;
+
if (unlikely(action_ret != IRQ_HANDLED)) {
desc->irqs_unhandled++;
if (unlikely(action_ret != IRQ_NONE))
@@ -146,9 +148,24 @@ void note_interrupt(unsigned int irq, st
 
if (unlikely(irqfixup)) {
/* Don't punish working computers */
-   if ((irqfixup == 2 && ((irq == 0) ||
-   (desc->action->flags & IRQF_IRQPOLL))) ||
-   action_ret == IRQ_NONE) {
+   if (action_ret == IRQ_NONE)
+   /* Nobody handled irq. Possibly a misrouted one. */
+   call_misrouted_irq = 1;
+   else if (irqfixup == 2) {
+   /* irqpoll is enabled. Is this the irq driving
+* polling.
+*/
+   if (irq == 0)
+   call_misrouted_irq = 1;
+   else {
+   spin_lock(>lock);
+   if (desc->action &&
+   (desc->action->flags & IRQF_IRQPOLL))
+   call_misrouted_irq = 1;
+   spin_unlock(>lock);
+   }
+   }
+   if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 irqpoll seems to be broken

2007-05-17 Thread Vivek Goyal
On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote:
 * Vivek Goyal [EMAIL PROTECTED] [2007-05-08 19:18]:
  On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
   * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]:

handle_edge_irq() already makes sure that desc-action is not null, 
still
note_interrupt() is receiving desc-action as null, that's strange. On 
my 
system this is happening for irq 4 and /proc/interrupt shows that it is
coming from serial.
   
   Unfortunately, I couldn't reproduce this here. Vivek, do you have time
   to take a look at this at your site? For the meanwhile, should I
   create a patch that checks for desc-action in note_interrupt(), too?
  
  I can reproduce this problem only on one machine. I think there is some
  race condition and your code somehow just exposes it.
 
 thanks for finding that out. Could you try/review out the patch below?
 As the lock is only aquired when irqfixup == 2 it shouldn't impact
 performance of a 'normal' system.
 

Hi Bernhard,

It does fix up my problem. I have modified your patch a bit. I think
new version is little more clear. What do you think?

Thanks
Vivek




o System crashes if booted with irqpoll command line option.

o Problem happens because Inside note_interrupt() we are accessing
  desc-action-flag without taking the desc-lock. While accessing it
  somebody goes ahead and unregisters the irq handler hence desc-action
  is NULL. By the time note_interrupt() checks it, it crashes.

o In my system it is irq 4 seriving to serial driver.

o Take the desc-lock before accessing desc-action-flag.

Signed-off-by: Bernhard Walle [EMAIL PROTECTED]
Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 linux-2.6.21-git12-root/kernel/irq/spurious.c |   23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff -puN kernel/irq/spurious.c~fix-irqpoll-crash kernel/irq/spurious.c
--- linux-2.6.21-git12/kernel/irq/spurious.c~fix-irqpoll-crash  2007-05-17 
17:36:50.0 +0530
+++ linux-2.6.21-git12-root/kernel/irq/spurious.c   2007-05-17 
17:53:52.0 +0530
@@ -138,6 +138,8 @@ report_bad_irq(unsigned int irq, struct 
 void note_interrupt(unsigned int irq, struct irq_desc *desc,
irqreturn_t action_ret)
 {
+   int call_misrouted_irq = 0;
+
if (unlikely(action_ret != IRQ_HANDLED)) {
desc-irqs_unhandled++;
if (unlikely(action_ret != IRQ_NONE))
@@ -146,9 +148,24 @@ void note_interrupt(unsigned int irq, st
 
if (unlikely(irqfixup)) {
/* Don't punish working computers */
-   if ((irqfixup == 2  ((irq == 0) ||
-   (desc-action-flags  IRQF_IRQPOLL))) ||
-   action_ret == IRQ_NONE) {
+   if (action_ret == IRQ_NONE)
+   /* Nobody handled irq. Possibly a misrouted one. */
+   call_misrouted_irq = 1;
+   else if (irqfixup == 2) {
+   /* irqpoll is enabled. Is this the irq driving
+* polling.
+*/
+   if (irq == 0)
+   call_misrouted_irq = 1;
+   else {
+   spin_lock(desc-lock);
+   if (desc-action 
+   (desc-action-flags  IRQF_IRQPOLL))
+   call_misrouted_irq = 1;
+   spin_unlock(desc-lock);
+   }
+   }
+   if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc-irqs_unhandled -= ok;
_
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 irqpoll seems to be broken

2007-05-17 Thread Bernhard Walle
* Vivek Goyal [EMAIL PROTECTED] [2007-05-17 15:05]:
 On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote:
  * Vivek Goyal [EMAIL PROTECTED] [2007-05-08 19:18]:
   On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
* Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]:
 
 handle_edge_irq() already makes sure that desc-action is not null, 
 still
 note_interrupt() is receiving desc-action as null, that's strange. 
 On my 
 system this is happening for irq 4 and /proc/interrupt shows that it 
 is
 coming from serial.

Unfortunately, I couldn't reproduce this here. Vivek, do you have time
to take a look at this at your site? For the meanwhile, should I
create a patch that checks for desc-action in note_interrupt(), too?
   
   I can reproduce this problem only on one machine. I think there is some
   race condition and your code somehow just exposes it.
  
  thanks for finding that out. Could you try/review out the patch below?
  As the lock is only aquired when irqfixup == 2 it shouldn't impact
  performance of a 'normal' system.
 
 It does fix up my problem. I have modified your patch a bit. I think
 new version is little more clear. What do you think?

Aggreed. Thanks for spotting that problem out!


Bernhard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken

2007-05-14 Thread Bernhard Walle
* Vivek Goyal <[EMAIL PROTECTED]> [2007-05-08 19:18]:
> On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]:
> > > 
> > > handle_edge_irq() already makes sure that desc->action is not null, still
> > > note_interrupt() is receiving desc->action as null, that's strange. On my 
> > > system this is happening for irq 4 and /proc/interrupt shows that it is
> > > coming from "serial".
> > 
> > Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> > to take a look at this at your site? For the meanwhile, should I
> > create a patch that checks for desc->action in note_interrupt(), too?
> 
> I can reproduce this problem only on one machine. I think there is some
> race condition and your code somehow just exposes it.

thanks for finding that out. Could you try/review out the patch below?
As the lock is only aquired when irqfixup == 2 it shouldn't impact
performance of a 'normal' system.

Thanks,
   Bernhard

---
 kernel/irq/spurious.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -145,10 +145,20 @@ void note_interrupt(unsigned int irq, st
}
 
if (unlikely(irqfixup)) {
-   /* Don't punish working computers */
-   if ((irqfixup == 2 && ((irq == 0) ||
-   (desc->action->flags & IRQF_IRQPOLL))) ||
-   action_ret == IRQ_NONE) {
+   int call_misrouted_irq = action_ret == IRQ_NONE;
+
+   if (!call_misrouted_irq && irqfixup == 2) {
+   if (irq == 0)
+   call_misrouted_irq = 1;
+   else {
+   spin_lock(>lock);
+   if (desc->action && (desc->action->flags & 
IRQF_IRQPOLL))
+   call_misrouted_irq = 1;
+   spin_unlock(>lock);
+   }
+   }
+
+   if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 irqpoll seems to be broken

2007-05-14 Thread Bernhard Walle
* Vivek Goyal [EMAIL PROTECTED] [2007-05-08 19:18]:
 On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
  * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]:
   
   handle_edge_irq() already makes sure that desc-action is not null, still
   note_interrupt() is receiving desc-action as null, that's strange. On my 
   system this is happening for irq 4 and /proc/interrupt shows that it is
   coming from serial.
  
  Unfortunately, I couldn't reproduce this here. Vivek, do you have time
  to take a look at this at your site? For the meanwhile, should I
  create a patch that checks for desc-action in note_interrupt(), too?
 
 I can reproduce this problem only on one machine. I think there is some
 race condition and your code somehow just exposes it.

thanks for finding that out. Could you try/review out the patch below?
As the lock is only aquired when irqfixup == 2 it shouldn't impact
performance of a 'normal' system.

Thanks,
   Bernhard

---
 kernel/irq/spurious.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -145,10 +145,20 @@ void note_interrupt(unsigned int irq, st
}
 
if (unlikely(irqfixup)) {
-   /* Don't punish working computers */
-   if ((irqfixup == 2  ((irq == 0) ||
-   (desc-action-flags  IRQF_IRQPOLL))) ||
-   action_ret == IRQ_NONE) {
+   int call_misrouted_irq = action_ret == IRQ_NONE;
+
+   if (!call_misrouted_irq  irqfixup == 2) {
+   if (irq == 0)
+   call_misrouted_irq = 1;
+   else {
+   spin_lock(desc-lock);
+   if (desc-action  (desc-action-flags  
IRQF_IRQPOLL))
+   call_misrouted_irq = 1;
+   spin_unlock(desc-lock);
+   }
+   }
+
+   if (call_misrouted_irq) {
int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc-irqs_unhandled -= ok;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken

2007-05-08 Thread Vivek Goyal
On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
> * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]:
> > 
> > handle_edge_irq() already makes sure that desc->action is not null, still
> > note_interrupt() is receiving desc->action as null, that's strange. On my 
> > system this is happening for irq 4 and /proc/interrupt shows that it is
> > coming from "serial".
> 
> Unfortunately, I couldn't reproduce this here. Vivek, do you have time
> to take a look at this at your site? For the meanwhile, should I
> create a patch that checks for desc->action in note_interrupt(), too?
> 

Hi Bernhard,

I can reproduce this problem only on one machine. I think there is some
race condition and your code somehow just exposes it.

I put few WARN_ON(!desc->action) in handle_edge_irq() and what I find
that after handle_IRQ_event(), desc->action has become null. That means
in the meantime somebody has gone ahead and modified the desc. This must
have happened because we have release desc->lock while running
handle_IRQ_event().

This means there is a race somewhere. It is verified by the fact that
this problem does not occur if same system is booted with only one
cpu (maxcpus=1).

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 irqpoll seems to be broken

2007-05-08 Thread Vivek Goyal
On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote:
 * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]:
  
  handle_edge_irq() already makes sure that desc-action is not null, still
  note_interrupt() is receiving desc-action as null, that's strange. On my 
  system this is happening for irq 4 and /proc/interrupt shows that it is
  coming from serial.
 
 Unfortunately, I couldn't reproduce this here. Vivek, do you have time
 to take a look at this at your site? For the meanwhile, should I
 create a patch that checks for desc-action in note_interrupt(), too?
 

Hi Bernhard,

I can reproduce this problem only on one machine. I think there is some
race condition and your code somehow just exposes it.

I put few WARN_ON(!desc-action) in handle_edge_irq() and what I find
that after handle_IRQ_event(), desc-action has become null. That means
in the meantime somebody has gone ahead and modified the desc. This must
have happened because we have release desc-lock while running
handle_IRQ_event().

This means there is a race somewhere. It is verified by the fact that
this problem does not occur if same system is booted with only one
cpu (maxcpus=1).

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 breaks 'lvm vgscan'.

2007-05-05 Thread Valdis . Kletnieks
On Thu, 26 Apr 2007 22:31:15 EDT, [EMAIL PROTECTED] said:
> On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> 
> This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel)
> 
> gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch
> 
> The initrd on my system does an 'lvm vgscan' to get the root filesystem
> accessible.

This is confirmed fixed in 2.6.21-mm1.



pgpYbr3u76BUo.pgp
Description: PGP signature


Re: 2.6.21-rc7-mm2 breaks 'lvm vgscan'.

2007-05-05 Thread Valdis . Kletnieks
On Thu, 26 Apr 2007 22:31:15 EDT, [EMAIL PROTECTED] said:
 On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said:
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
 
 This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel)
 
 gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch
 
 The initrd on my system does an 'lvm vgscan' to get the root filesystem
 accessible.

This is confirmed fixed in 2.6.21-mm1.



pgpYbr3u76BUo.pgp
Description: PGP signature


Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Andrew Morton wrote:

> Better, we should be emitting loud warnigns which then disable themselves
> and then succeeding the allocation so that people can proceed with their
> kernel testing.
> 
> When all the loud-warning sites have been fixed, we can take that code out
> again.
> 
> The present situation is maximally tester-hostile.
i

SLUB: Allocate smallest object size if the user asks for 0 bytes.

Makes SLUB behave like SLAB in this area to avoid issues

Throw a stack dump to alert people.

At some point the behavior should be switched back. NULL is no
memory as far as I can tell and if the use asked for 0 bytes then
he need to get no memory.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |8 ++--
 mm/slub.c|2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-05-04 14:17:22.0 -0700
+++ slub/mm/slub.c  2007-05-04 14:19:36.0 -0700
@@ -2009,7 +2009,7 @@ static struct kmem_cache *get_slab(size_
 {
int index = kmalloc_index(size);
 
-   if (!size)
+   if (!index)
return NULL;
 
/* Allocation too large? */
Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-05-04 14:13:40.0 -0700
+++ slub/include/linux/slub_def.h   2007-05-04 14:18:25.0 -0700
@@ -81,8 +81,12 @@ extern struct kmem_cache kmalloc_caches[
  */
 static inline int kmalloc_index(int size)
 {
-   if (size == 0)
-   return 0;
+   /*
+* We should return 0 if size == 0 but we use the smallest object
+* here for SLAB legacy reasons.
+*/
+   WARN_ON(size == 0);
+
if (size > 64 && size <= 96)
return 1;
if (size > 128 && size <= 192)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Andrew Morton
On Fri, 04 May 2007 12:38:58 +0100
Andy Whitcroft <[EMAIL PROTECTED]> wrote:

> 
> Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a
> bit of a challenge.  We have been seeing panics on boot from the
> hvsi driver:
> 
>   Couldn't register hvsi console driver
> 
> Tracking this back, this seems to come from hvsi driver trying to
> register itself via tty_register_driver() with a zero units.
> 
> The failure is triggered by a change in semantics for kmalloc()
> between SLAB and SLUB; kmalloc(0) now returns NULL rather than an
> allocation at the smallest size.  Looking at the code in question
> even when the allocation succeeds we will not actually use the
> memory when device->num is zero.

OK, thanks for working that out.

Christoph, we should be emitting loud warnings so that this problem is easy
to debug.

Better, we should be emitting loud warnigns which then disable themselves
and then succeeding the allocation so that people can proceed with their
kernel testing.

When all the loud-warning sites have been fixed, we can take that code out
again.

The present situation is maximally tester-hostile.

> It is not clear to me if this is a bug in the hvsi driver in that
> it should specify some units.  It seems we will try and reserve zero
> devices in this case, which seems pointless.
> 
> I have tested with the patch below which seems safe to me and stops
> the errors and even seems to make the console work.  But perhaps
> someone with more driver fu, could verify if driver->num of zero
> has any meaning and kick this to the hvsi people if not.
> 
> -apw
> 
> === 8< ===
> tty_register_driver: only allocate tty instances when defined
> 
> If device->num is zero we attempt to kmalloc() zero bytes.
> When SLUB is enabled this returns a null pointer and take that as
> an allocation failure and fail the device register.  Check for no
> devices and avoid the allocation.
> 
> Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]>
> ---
> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index 959a616..71c4579 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver)
>   if (driver->flags & TTY_DRIVER_INSTALLED)
>   return 0;
>  
> - if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM)) {
> + if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM) && driver->num) {
>   p = kmalloc(driver->num * 3 * sizeof(void *), GFP_KERNEL);
>   if (!p)
>   return -ENOMEM;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Linas Vepstas
On Fri, May 04, 2007 at 12:38:58PM +0100, Andy Whitcroft wrote:
> 
> Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a
> bit of a challenge.  We have been seeing panics on boot from the
> hvsi driver:
> 
>   Couldn't register hvsi console driver
> 
> Tracking this back, this seems to come from hvsi driver trying to
> register itself via tty_register_driver() with a zero units.
> 
> The failure is triggered by a change in semantics for kmalloc()
> between SLAB and SLUB; kmalloc(0) now returns NULL rather than an
> allocation at the smallest size.  Looking at the code in question
> even when the allocation succeeds we will not actually use the
> memory when device->num is zero.
> 
> It is not clear to me if this is a bug in the hvsi driver in that
> it should specify some units.  It seems we will try and reserve zero
> devices in this case, which seems pointless.

Yes, it seems pointless to me ... 

> I have tested with the patch below which seems safe to me and stops
> the errors and even seems to make the console work.  But perhaps
> someone with more driver fu, could verify if driver->num of zero
> has any meaning and kick this to the hvsi people if not.

Hollis nominated me to be "hvsi people", although I'm near-totally
ignorant of the thing.

If hvsi_count is zero, then the device tree did not have any
"serial" nodes that speak "hvterm-protocol". The hvsi should not 
have even tried to register anything. The attached patch seems more 
to the point.

--linas


The hvsi driver is used whenever the device-tree contains
nodes for serial ports, and those serial ports speak the hvterm
protocol. However, if no such nodes are found, then the hvsi
driver should not even register. 

This patch avoids a kernel panic with "Couldn't register hvsi 
console driver". 

In addition, this patch makes tty_register_driver refuse
to do anything, if there are no actual tty ports to be 
registered.

Utterly & completely untested.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

----
 drivers/char/hvsi.c   |4 
 drivers/char/tty_io.c |3 +++
 2 files changed, 7 insertions(+)

Index: linux-2.6.21-rc7-mm2/drivers/char/hvsi.c
===============
--- linux-2.6.21-rc7-mm2.orig/drivers/char/hvsi.c   2007-04-26 
15:37:33.0 -0500
+++ linux-2.6.21-rc7-mm2/drivers/char/hvsi.c2007-05-04 13:55:56.0 
-0500
@@ -1148,6 +1148,10 @@ static int __init hvsi_init(void)
 {
int i;
 
+   /* No serial hvterm-protocol device-tree nodes found. */
+   if (hvsi_count == 0)
+   return 0;
+
    hvsi_driver = alloc_tty_driver(hvsi_count);
if (!hvsi_driver)
return -ENOMEM;
Index: linux-2.6.21-rc7-mm2/drivers/char/tty_io.c
===============
--- linux-2.6.21-rc7-mm2.orig/drivers/char/tty_io.c 2007-04-26 
15:37:33.0 -0500
+++ linux-2.6.21-rc7-mm2/drivers/char/tty_io.c  2007-05-04 13:54:14.0 
-0500
@@ -3724,6 +3724,9 @@ int tty_register_driver(struct tty_drive
if (driver->flags & TTY_DRIVER_INSTALLED)
return 0;
 
+   if (driver->num == 0)
+   return -ENODEV;
+
if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM)) {
p = kmalloc(driver->num * 3 * sizeof(void *), GFP_KERNEL);
if (!p)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Andy Whitcroft

Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a
bit of a challenge.  We have been seeing panics on boot from the
hvsi driver:

Couldn't register hvsi console driver

Tracking this back, this seems to come from hvsi driver trying to
register itself via tty_register_driver() with a zero units.

The failure is triggered by a change in semantics for kmalloc()
between SLAB and SLUB; kmalloc(0) now returns NULL rather than an
allocation at the smallest size.  Looking at the code in question
even when the allocation succeeds we will not actually use the
memory when device->num is zero.

It is not clear to me if this is a bug in the hvsi driver in that
it should specify some units.  It seems we will try and reserve zero
devices in this case, which seems pointless.

I have tested with the patch below which seems safe to me and stops
the errors and even seems to make the console work.  But perhaps
someone with more driver fu, could verify if driver->num of zero
has any meaning and kick this to the hvsi people if not.

-apw

=== 8< ===
tty_register_driver: only allocate tty instances when defined

If device->num is zero we attempt to kmalloc() zero bytes.
When SLUB is enabled this returns a null pointer and take that as
an allocation failure and fail the device register.  Check for no
devices and avoid the allocation.

Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]>
---
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 959a616..71c4579 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver)
if (driver->flags & TTY_DRIVER_INSTALLED)
return 0;
 
-   if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM)) {
+   if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM) && driver->num) {
p = kmalloc(driver->num * 3 * sizeof(void *), GFP_KERNEL);
if (!p)
return -ENOMEM;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Andy Whitcroft

Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a
bit of a challenge.  We have been seeing panics on boot from the
hvsi driver:

Couldn't register hvsi console driver

Tracking this back, this seems to come from hvsi driver trying to
register itself via tty_register_driver() with a zero units.

The failure is triggered by a change in semantics for kmalloc()
between SLAB and SLUB; kmalloc(0) now returns NULL rather than an
allocation at the smallest size.  Looking at the code in question
even when the allocation succeeds we will not actually use the
memory when device-num is zero.

It is not clear to me if this is a bug in the hvsi driver in that
it should specify some units.  It seems we will try and reserve zero
devices in this case, which seems pointless.

I have tested with the patch below which seems safe to me and stops
the errors and even seems to make the console work.  But perhaps
someone with more driver fu, could verify if driver-num of zero
has any meaning and kick this to the hvsi people if not.

-apw

=== 8 ===
tty_register_driver: only allocate tty instances when defined

If device-num is zero we attempt to kmalloc() zero bytes.
When SLUB is enabled this returns a null pointer and take that as
an allocation failure and fail the device register.  Check for no
devices and avoid the allocation.

Signed-off-by: Andy Whitcroft [EMAIL PROTECTED]
---
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 959a616..71c4579 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver)
if (driver-flags  TTY_DRIVER_INSTALLED)
return 0;
 
-   if (!(driver-flags  TTY_DRIVER_DEVPTS_MEM)) {
+   if (!(driver-flags  TTY_DRIVER_DEVPTS_MEM)  driver-num) {
p = kmalloc(driver-num * 3 * sizeof(void *), GFP_KERNEL);
if (!p)
return -ENOMEM;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Linas Vepstas
On Fri, May 04, 2007 at 12:38:58PM +0100, Andy Whitcroft wrote:
 
 Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a
 bit of a challenge.  We have been seeing panics on boot from the
 hvsi driver:
 
   Couldn't register hvsi console driver
 
 Tracking this back, this seems to come from hvsi driver trying to
 register itself via tty_register_driver() with a zero units.
 
 The failure is triggered by a change in semantics for kmalloc()
 between SLAB and SLUB; kmalloc(0) now returns NULL rather than an
 allocation at the smallest size.  Looking at the code in question
 even when the allocation succeeds we will not actually use the
 memory when device-num is zero.
 
 It is not clear to me if this is a bug in the hvsi driver in that
 it should specify some units.  It seems we will try and reserve zero
 devices in this case, which seems pointless.

Yes, it seems pointless to me ... 

 I have tested with the patch below which seems safe to me and stops
 the errors and even seems to make the console work.  But perhaps
 someone with more driver fu, could verify if driver-num of zero
 has any meaning and kick this to the hvsi people if not.

Hollis nominated me to be hvsi people, although I'm near-totally
ignorant of the thing.

If hvsi_count is zero, then the device tree did not have any
serial nodes that speak hvterm-protocol. The hvsi should not 
have even tried to register anything. The attached patch seems more 
to the point.

--linas


The hvsi driver is used whenever the device-tree contains
nodes for serial ports, and those serial ports speak the hvterm
protocol. However, if no such nodes are found, then the hvsi
driver should not even register. 

This patch avoids a kernel panic with Couldn't register hvsi 
console driver. 

In addition, this patch makes tty_register_driver refuse
to do anything, if there are no actual tty ports to be 
registered.

Utterly  completely untested.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/char/hvsi.c   |4 
 drivers/char/tty_io.c |3 +++
 2 files changed, 7 insertions(+)

Index: linux-2.6.21-rc7-mm2/drivers/char/hvsi.c
===
--- linux-2.6.21-rc7-mm2.orig/drivers/char/hvsi.c   2007-04-26 
15:37:33.0 -0500
+++ linux-2.6.21-rc7-mm2/drivers/char/hvsi.c2007-05-04 13:55:56.0 
-0500
@@ -1148,6 +1148,10 @@ static int __init hvsi_init(void)
 {
int i;
 
+   /* No serial hvterm-protocol device-tree nodes found. */
+   if (hvsi_count == 0)
+   return 0;
+
hvsi_driver = alloc_tty_driver(hvsi_count);
if (!hvsi_driver)
return -ENOMEM;
Index: linux-2.6.21-rc7-mm2/drivers/char/tty_io.c
===
--- linux-2.6.21-rc7-mm2.orig/drivers/char/tty_io.c 2007-04-26 
15:37:33.0 -0500
+++ linux-2.6.21-rc7-mm2/drivers/char/tty_io.c  2007-05-04 13:54:14.0 
-0500
@@ -3724,6 +3724,9 @@ int tty_register_driver(struct tty_drive
if (driver-flags  TTY_DRIVER_INSTALLED)
return 0;
 
+   if (driver-num == 0)
+   return -ENODEV;
+
if (!(driver-flags  TTY_DRIVER_DEVPTS_MEM)) {
p = kmalloc(driver-num * 3 * sizeof(void *), GFP_KERNEL);
if (!p)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Andrew Morton
On Fri, 04 May 2007 12:38:58 +0100
Andy Whitcroft [EMAIL PROTECTED] wrote:

 
 Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a
 bit of a challenge.  We have been seeing panics on boot from the
 hvsi driver:
 
   Couldn't register hvsi console driver
 
 Tracking this back, this seems to come from hvsi driver trying to
 register itself via tty_register_driver() with a zero units.
 
 The failure is triggered by a change in semantics for kmalloc()
 between SLAB and SLUB; kmalloc(0) now returns NULL rather than an
 allocation at the smallest size.  Looking at the code in question
 even when the allocation succeeds we will not actually use the
 memory when device-num is zero.

OK, thanks for working that out.

Christoph, we should be emitting loud warnings so that this problem is easy
to debug.

Better, we should be emitting loud warnigns which then disable themselves
and then succeeding the allocation so that people can proceed with their
kernel testing.

When all the loud-warning sites have been fixed, we can take that code out
again.

The present situation is maximally tester-hostile.

 It is not clear to me if this is a bug in the hvsi driver in that
 it should specify some units.  It seems we will try and reserve zero
 devices in this case, which seems pointless.
 
 I have tested with the patch below which seems safe to me and stops
 the errors and even seems to make the console work.  But perhaps
 someone with more driver fu, could verify if driver-num of zero
 has any meaning and kick this to the hvsi people if not.
 
 -apw
 
 === 8 ===
 tty_register_driver: only allocate tty instances when defined
 
 If device-num is zero we attempt to kmalloc() zero bytes.
 When SLUB is enabled this returns a null pointer and take that as
 an allocation failure and fail the device register.  Check for no
 devices and avoid the allocation.
 
 Signed-off-by: Andy Whitcroft [EMAIL PROTECTED]
 ---
 diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
 index 959a616..71c4579 100644
 --- a/drivers/char/tty_io.c
 +++ b/drivers/char/tty_io.c
 @@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver)
   if (driver-flags  TTY_DRIVER_INSTALLED)
   return 0;
  
 - if (!(driver-flags  TTY_DRIVER_DEVPTS_MEM)) {
 + if (!(driver-flags  TTY_DRIVER_DEVPTS_MEM)  driver-num) {
   p = kmalloc(driver-num * 3 * sizeof(void *), GFP_KERNEL);
   if (!p)
   return -ENOMEM;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Andrew Morton wrote:

 Better, we should be emitting loud warnigns which then disable themselves
 and then succeeding the allocation so that people can proceed with their
 kernel testing.
 
 When all the loud-warning sites have been fixed, we can take that code out
 again.
 
 The present situation is maximally tester-hostile.
i

SLUB: Allocate smallest object size if the user asks for 0 bytes.

Makes SLUB behave like SLAB in this area to avoid issues

Throw a stack dump to alert people.

At some point the behavior should be switched back. NULL is no
memory as far as I can tell and if the use asked for 0 bytes then
he need to get no memory.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 include/linux/slub_def.h |8 ++--
 mm/slub.c|2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-05-04 14:17:22.0 -0700
+++ slub/mm/slub.c  2007-05-04 14:19:36.0 -0700
@@ -2009,7 +2009,7 @@ static struct kmem_cache *get_slab(size_
 {
int index = kmalloc_index(size);
 
-   if (!size)
+   if (!index)
return NULL;
 
/* Allocation too large? */
Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-05-04 14:13:40.0 -0700
+++ slub/include/linux/slub_def.h   2007-05-04 14:18:25.0 -0700
@@ -81,8 +81,12 @@ extern struct kmem_cache kmalloc_caches[
  */
 static inline int kmalloc_index(int size)
 {
-   if (size == 0)
-   return 0;
+   /*
+* We should return 0 if size == 0 but we use the smallest object
+* here for SLAB legacy reasons.
+*/
+   WARN_ON(size == 0);
+
if (size  64  size = 96)
return 1;
if (size  128  size = 192)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 11:41:22AM +0200, Tilman Schmidt wrote:
> On Wed, 2 May 2007 00:43:05 -0700, "Greg KH" <[EMAIL PROTECTED]> said:
> 
> > > > And the winner is:
> > > > 
> > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> > > > 
> > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> > > > again.
> > 
> > Wait, even though this isn't good, it shouldn't have been hit by anyone,
> > that file used to not be readable, so I doubt userspace would have been
> > trying to read it...
> > 
> > Tilman, what version of HAL and udev do you have on your machine?
> 
> The ones that came with SuSE 10.0:
> 
> hal-0.5.4-6.4
> udev-068git20050831-9

Ah, ok, that explains it, the really old libsysfs walks and opens all
files in sysfs for some odd, strange, and broken reason.  This has been
fixed in newer versions, and explains why you are seeing this happen.

I'll send my fix for this to Linus in a few hours.

thanks for testing and tracking this down, I really appreciate it.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
Am 02.05.2007 22:07 schrieb Andrew Morton:
>> Started to git-bisect mainline now, but that will take some time.
[...]
> I don't think there's much point in you doing that.  We know what the bug is.

Good. Saves me some work. :-)

If you'd like me to test anything, just let me know.

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Andrew Morton
On Wed, 02 May 2007 19:36:03 +0200
Tilman Schmidt <[EMAIL PROTECTED]> wrote:

> Am 02.05.2007 09:52 schrieb Greg KH:
> > Tilman, here's a patch, can you try this on top of your tree that dies?
> 
> 2.6.21-git3 plus that patch comes up fine.
> 
> (Except for a UDP problem I seem to remember I already saw reported
> on lkml and which I'll ignore for now in order not to blur the
> picture.)

Thanks.

> Started to git-bisect mainline now, but that will take some time.
> It's more than 800 patches to check and I don't get more than 2-3
> iterations per day out of that machine.

I don't think there's much point in you doing that.  We know what the bug is.

Switching to 8k stacks will probably fix things up too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
Am 02.05.2007 09:52 schrieb Greg KH:
> Tilman, here's a patch, can you try this on top of your tree that dies?

2.6.21-git3 plus that patch comes up fine.

(Except for a UDP problem I seem to remember I already saw reported
on lkml and which I'll ignore for now in order not to blur the
picture.)

Started to git-bisect mainline now, but that will take some time.
It's more than 800 patches to check and I don't get more than 2-3
iterations per day out of that machine.

HTH
T.

> ---
>  drivers/base/core.c |7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device
>   struct kobject *top_kobj;
>   struct kset *kset;
>   char *envp[32];
> - char data[PAGE_SIZE];
> + char *data = NULL;
>   char *pos;
>   int i;
>   size_t count = 0;
> @@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device
>   if (!kset->uevent_ops->filter(kset, >kobj))
>   goto out;
>  
> + data = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
>   /* let the kset specific function add its keys */
>   pos = data;
>   retval = kset->uevent_ops->uevent(kset, >kobj,
> @@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device
>   count += sprintf(pos, "%s\n", envp[i]);
>   }
>  out:
> + free_page((unsigned long)data);
>   return count;
>  }
>  

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Kay Sievers

On 5/2/07, Greg KH <[EMAIL PROTECTED]> wrote:

On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
> On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote:
>
> > Am 30.04.2007 21:46 schrieb Andrew Morton:
> > > Not really - everything's tangled up.  A bisection search on the
> > > 2.6.21-rc7-mm2 driver tree would be the best bet.
> >
> > And the winner is:
> >
> > 
gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> >
> > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> > again.
>
> cripes.
>
> +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
> +  char *buf)
> +{
> +   struct kobject *top_kobj;
> +   struct kset *kset;
> +   char *envp[32];
> +   char data[PAGE_SIZE];
>
> That won't work too well with 4k stacks.


Yeah, sorry.


Wait, even though this isn't good, it shouldn't have been hit by anyone,
that file used to not be readable, so I doubt userspace would have been
trying to read it...

Tilman, what version of HAL and udev do you have on your machine?

Kay, did you get the 'read the uevent file' code already into udev
and/or HAL?


Only udevtest uses this at the moment, but that is only used for debugging.
It's probably the brain-dead libsysfs, which opens and reads every
file in /sys, even when nobody is interested in the data.

Thanks,
Kay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
On Wed, 2 May 2007 00:43:05 -0700, "Greg KH" <[EMAIL PROTECTED]> said:

> > > And the winner is:
> > > 
> > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> > > 
> > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> > > again.
> 
> Wait, even though this isn't good, it shouldn't have been hit by anyone,
> that file used to not be readable, so I doubt userspace would have been
> trying to read it...
> 
> Tilman, what version of HAL and udev do you have on your machine?

The ones that came with SuSE 10.0:

hal-0.5.4-6.4
udev-068git20050831-9

HTH
Tilman

PS: I'll test your patch and git-bisect when I'm back at the machine.
-- 
  Tilman Schmidt
  [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Build break on ppc64 for 2.6.21-rc7-mm2

2007-05-02 Thread Srinivasa Ds
Hi

 When compiling 2.6.21-rc7-mm2, I encountered this error.
 =
  CC [M]  drivers/net/e1000/e1000_ethtool.o
  CC [M]  drivers/net/e1000/e1000_main.o
  LD [M]  drivers/net/e1000/e1000.o
  LD  drivers/net/ehea/built-in.o
  CC [M]  drivers/net/ehea/ehea_main.o
drivers/net/ehea/ehea_main.c: In function ehea_hash_skb:
drivers/net/ehea/ehea_main.c:1806: error: struct sk_buff has no member named 
nh
drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named 
nh
drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named 
nh
drivers/net/ehea/ehea_main.c:1809: error: struct sk_buff has no member named 
nh
make[3]: *** [drivers/net/ehea/ehea_main.o] Error 1
make[2]: *** [drivers/net/ehea] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2
=

Since code is not compatible with struct sk_buff change, we have this error. 
Below patch should fix this problem. Please let me know your comments on 
this.

Signed-off-by: Srinivasa Ds <[EMAIL PROTECTED]>
---
 drivers/net/ehea/ehea_main.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c
===
--- linux-2.6.21-rc7.orig/drivers/net/ehea/ehea_main.c
+++ linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c
@@ -1803,10 +1803,10 @@ static inline int ehea_hash_skb(struct s
u32 tmp;
 
if ((skb->protocol == htons(ETH_P_IP)) &&
-   (skb->nh.iph->protocol == IPPROTO_TCP)) {
-   tcp = (struct tcphdr*)(skb->nh.raw + (skb->nh.iph->ihl * 4));
+   (ip_hdr(skb)->protocol == IPPROTO_TCP)) {
+   tcp = (struct tcphdr*)(skb_network_header(skb) + 
(ip_hdr(skb)->ihl * 4));
tmp = (tcp->source + (tcp->dest << 16)) % 31;
-   tmp += skb->nh.iph->daddr % 31;
+   tmp += ip_hdr(skb)->daddr % 31;
return tmp % num_qps;
}
else


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
> On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote:
> 
> > Am 30.04.2007 21:46 schrieb Andrew Morton:
> > > Not really - everything's tangled up.  A bisection search on the
> > > 2.6.21-rc7-mm2 driver tree would be the best bet.
> > 
> > And the winner is:
> > 
> > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> > 
> > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> > again.
> 
> cripes.
> 
> +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
> +  char *buf)
> +{
> +   struct kobject *top_kobj;
> +   struct kset *kset;
> +   char *envp[32];
> +   char data[PAGE_SIZE];
> 
> That won't work too well with 4k stacks.

Tilman, here's a patch, can you try this on top of your tree that dies?

thanks,

greg k-h

---
 drivers/base/core.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device
struct kobject *top_kobj;
struct kset *kset;
char *envp[32];
-   char data[PAGE_SIZE];
+   char *data = NULL;
char *pos;
int i;
size_t count = 0;
@@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device
if (!kset->uevent_ops->filter(kset, >kobj))
goto out;
 
+   data = (char *)get_zeroed_page(GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
/* let the kset specific function add its keys */
pos = data;
retval = kset->uevent_ops->uevent(kset, >kobj,
@@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device
count += sprintf(pos, "%s\n", envp[i]);
}
 out:
+   free_page((unsigned long)data);
return count;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
> On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote:
> 
> > Am 30.04.2007 21:46 schrieb Andrew Morton:
> > > Not really - everything's tangled up.  A bisection search on the
> > > 2.6.21-rc7-mm2 driver tree would be the best bet.
> > 
> > And the winner is:
> > 
> > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> > 
> > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> > again.
> 
> cripes.
> 
> +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
> +  char *buf)
> +{
> +   struct kobject *top_kobj;
> +   struct kset *kset;
> +   char *envp[32];
> +   char data[PAGE_SIZE];
> 
> That won't work too well with 4k stacks.

Wait, even though this isn't good, it shouldn't have been hit by anyone,
that file used to not be readable, so I doubt userspace would have been
trying to read it...

Tilman, what version of HAL and udev do you have on your machine?

Kay, did you get the 'read the uevent file' code already into udev
and/or HAL?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
> On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote:
> 
> > Am 30.04.2007 21:46 schrieb Andrew Morton:
> > > Not really - everything's tangled up.  A bisection search on the
> > > 2.6.21-rc7-mm2 driver tree would be the best bet.
> > 
> > And the winner is:
> > 
> > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> > 
> > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> > again.
> 
> cripes.
> 
> +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
> +  char *buf)
> +{
> +   struct kobject *top_kobj;
> +   struct kset *kset;
> +   char *envp[32];
> +   char data[PAGE_SIZE];
> 
> That won't work too well with 4k stacks.

Oh crap.  Yeah, that's not nice.

> Who's reviewing this stuff?  The patch headers indicate that no mailing list 
> was
> cc'ed?

Kay and I did this, sorry, it should have been cc:ed to lkml.

I'll go fix it up now...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Andrew Morton
On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote:

> Am 30.04.2007 21:46 schrieb Andrew Morton:
> > Not really - everything's tangled up.  A bisection search on the
> > 2.6.21-rc7-mm2 driver tree would be the best bet.
> 
> And the winner is:
> 
> gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> 
> Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> again.

cripes.

+static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   struct kobject *top_kobj;
+   struct kset *kset;
+   char *envp[32];
+   char data[PAGE_SIZE];

That won't work too well with 4k stacks.

Who's reviewing this stuff?  The patch headers indicate that no mailing list was
cc'ed?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Nick Piggin

Tilman Schmidt wrote:

Am 30.04.2007 21:46 schrieb Andrew Morton:


Not really - everything's tangled up.  A bisection search on the
2.6.21-rc7-mm2 driver tree would be the best bet.



And the winner is:

gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch


+   struct kobject *top_kobj;
+   struct kset *kset;
+   char *envp[32];
+   char data[PAGE_SIZE];
+   char *pos;
+   int i;
+   size_t count = 0;
+   int retval;

... that seems like a lot of stack to be using.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 09:01:22AM +0200, Tilman Schmidt wrote:
> Am 30.04.2007 21:46 schrieb Andrew Morton:
> > Not really - everything's tangled up.  A bisection search on the
> > 2.6.21-rc7-mm2 driver tree would be the best bet.
> 
> And the winner is:
> 
> gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> 
> Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
> again.
> 
> I'll try building 2.6.21-git3 minus that one next, but I'll have
> to revert it manually, because my naive attempt to "patch -R" it
> failed 1 out of 2 hunks.

Ok, that's just wierd, it only adds a new feature, it doesn't touch any
existing code to cause things to go wrong.

Can you try using 'git bisect' on Linus's tree instead?  That should
show the real problem much easier.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
Am 30.04.2007 21:46 schrieb Andrew Morton:
> Not really - everything's tangled up.  A bisection search on the
> 2.6.21-rc7-mm2 driver tree would be the best bet.

And the winner is:

gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch

Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
again.

I'll try building 2.6.21-git3 minus that one next, but I'll have
to revert it manually, because my naive attempt to "patch -R" it
failed 1 out of 2 hunks.

HTH
T.

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
Am 30.04.2007 21:46 schrieb Andrew Morton:
 Not really - everything's tangled up.  A bisection search on the
 2.6.21-rc7-mm2 driver tree would be the best bet.

And the winner is:

gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch

Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
again.

I'll try building 2.6.21-git3 minus that one next, but I'll have
to revert it manually, because my naive attempt to patch -R it
failed 1 out of 2 hunks.

HTH
T.

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 09:01:22AM +0200, Tilman Schmidt wrote:
 Am 30.04.2007 21:46 schrieb Andrew Morton:
  Not really - everything's tangled up.  A bisection search on the
  2.6.21-rc7-mm2 driver tree would be the best bet.
 
 And the winner is:
 
 gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
 
 Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
 again.
 
 I'll try building 2.6.21-git3 minus that one next, but I'll have
 to revert it manually, because my naive attempt to patch -R it
 failed 1 out of 2 hunks.

Ok, that's just wierd, it only adds a new feature, it doesn't touch any
existing code to cause things to go wrong.

Can you try using 'git bisect' on Linus's tree instead?  That should
show the real problem much easier.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Andrew Morton
On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote:

 Am 30.04.2007 21:46 schrieb Andrew Morton:
  Not really - everything's tangled up.  A bisection search on the
  2.6.21-rc7-mm2 driver tree would be the best bet.
 
 And the winner is:
 
 gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
 
 Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
 again.

cripes.

+static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   struct kobject *top_kobj;
+   struct kset *kset;
+   char *envp[32];
+   char data[PAGE_SIZE];

That won't work too well with 4k stacks.

Who's reviewing this stuff?  The patch headers indicate that no mailing list was
cc'ed?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Nick Piggin

Tilman Schmidt wrote:

Am 30.04.2007 21:46 schrieb Andrew Morton:


Not really - everything's tangled up.  A bisection search on the
2.6.21-rc7-mm2 driver tree would be the best bet.



And the winner is:

gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch


+   struct kobject *top_kobj;
+   struct kset *kset;
+   char *envp[32];
+   char data[PAGE_SIZE];
+   char *pos;
+   int i;
+   size_t count = 0;
+   int retval;

... that seems like a lot of stack to be using.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
 On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote:
 
  Am 30.04.2007 21:46 schrieb Andrew Morton:
   Not really - everything's tangled up.  A bisection search on the
   2.6.21-rc7-mm2 driver tree would be the best bet.
  
  And the winner is:
  
  gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
  
  Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
  again.
 
 cripes.
 
 +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
 +  char *buf)
 +{
 +   struct kobject *top_kobj;
 +   struct kset *kset;
 +   char *envp[32];
 +   char data[PAGE_SIZE];
 
 That won't work too well with 4k stacks.

Oh crap.  Yeah, that's not nice.

 Who's reviewing this stuff?  The patch headers indicate that no mailing list 
 was
 cc'ed?

Kay and I did this, sorry, it should have been cc:ed to lkml.

I'll go fix it up now...

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
 On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote:
 
  Am 30.04.2007 21:46 schrieb Andrew Morton:
   Not really - everything's tangled up.  A bisection search on the
   2.6.21-rc7-mm2 driver tree would be the best bet.
  
  And the winner is:
  
  gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
  
  Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
  again.
 
 cripes.
 
 +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
 +  char *buf)
 +{
 +   struct kobject *top_kobj;
 +   struct kset *kset;
 +   char *envp[32];
 +   char data[PAGE_SIZE];
 
 That won't work too well with 4k stacks.

Wait, even though this isn't good, it shouldn't have been hit by anyone,
that file used to not be readable, so I doubt userspace would have been
trying to read it...

Tilman, what version of HAL and udev do you have on your machine?

Kay, did you get the 'read the uevent file' code already into udev
and/or HAL?

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
 On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote:
 
  Am 30.04.2007 21:46 schrieb Andrew Morton:
   Not really - everything's tangled up.  A bisection search on the
   2.6.21-rc7-mm2 driver tree would be the best bet.
  
  And the winner is:
  
  gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
  
  Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
  again.
 
 cripes.
 
 +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
 +  char *buf)
 +{
 +   struct kobject *top_kobj;
 +   struct kset *kset;
 +   char *envp[32];
 +   char data[PAGE_SIZE];
 
 That won't work too well with 4k stacks.

Tilman, here's a patch, can you try this on top of your tree that dies?

thanks,

greg k-h

---
 drivers/base/core.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device
struct kobject *top_kobj;
struct kset *kset;
char *envp[32];
-   char data[PAGE_SIZE];
+   char *data = NULL;
char *pos;
int i;
size_t count = 0;
@@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device
if (!kset-uevent_ops-filter(kset, dev-kobj))
goto out;
 
+   data = (char *)get_zeroed_page(GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
/* let the kset specific function add its keys */
pos = data;
retval = kset-uevent_ops-uevent(kset, dev-kobj,
@@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device
count += sprintf(pos, %s\n, envp[i]);
}
 out:
+   free_page((unsigned long)data);
return count;
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Build break on ppc64 for 2.6.21-rc7-mm2

2007-05-02 Thread Srinivasa Ds
Hi

 When compiling 2.6.21-rc7-mm2, I encountered this error.
 =
  CC [M]  drivers/net/e1000/e1000_ethtool.o
  CC [M]  drivers/net/e1000/e1000_main.o
  LD [M]  drivers/net/e1000/e1000.o
  LD  drivers/net/ehea/built-in.o
  CC [M]  drivers/net/ehea/ehea_main.o
drivers/net/ehea/ehea_main.c: In function ehea_hash_skb:
drivers/net/ehea/ehea_main.c:1806: error: struct sk_buff has no member named 
nh
drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named 
nh
drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named 
nh
drivers/net/ehea/ehea_main.c:1809: error: struct sk_buff has no member named 
nh
make[3]: *** [drivers/net/ehea/ehea_main.o] Error 1
make[2]: *** [drivers/net/ehea] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2
=

Since code is not compatible with struct sk_buff change, we have this error. 
Below patch should fix this problem. Please let me know your comments on 
this.

Signed-off-by: Srinivasa Ds [EMAIL PROTECTED]
---
 drivers/net/ehea/ehea_main.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c
===
--- linux-2.6.21-rc7.orig/drivers/net/ehea/ehea_main.c
+++ linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c
@@ -1803,10 +1803,10 @@ static inline int ehea_hash_skb(struct s
u32 tmp;
 
if ((skb-protocol == htons(ETH_P_IP)) 
-   (skb-nh.iph-protocol == IPPROTO_TCP)) {
-   tcp = (struct tcphdr*)(skb-nh.raw + (skb-nh.iph-ihl * 4));
+   (ip_hdr(skb)-protocol == IPPROTO_TCP)) {
+   tcp = (struct tcphdr*)(skb_network_header(skb) + 
(ip_hdr(skb)-ihl * 4));
tmp = (tcp-source + (tcp-dest  16)) % 31;
-   tmp += skb-nh.iph-daddr % 31;
+   tmp += ip_hdr(skb)-daddr % 31;
return tmp % num_qps;
}
else


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
On Wed, 2 May 2007 00:43:05 -0700, Greg KH [EMAIL PROTECTED] said:

   And the winner is:
   
   gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
   
   Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
   again.
 
 Wait, even though this isn't good, it shouldn't have been hit by anyone,
 that file used to not be readable, so I doubt userspace would have been
 trying to read it...
 
 Tilman, what version of HAL and udev do you have on your machine?

The ones that came with SuSE 10.0:

hal-0.5.4-6.4
udev-068git20050831-9

HTH
Tilman

PS: I'll test your patch and git-bisect when I'm back at the machine.
-- 
  Tilman Schmidt
  [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Kay Sievers

On 5/2/07, Greg KH [EMAIL PROTECTED] wrote:

On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote:
 On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote:

  Am 30.04.2007 21:46 schrieb Andrew Morton:
   Not really - everything's tangled up.  A bisection search on the
   2.6.21-rc7-mm2 driver tree would be the best bet.
 
  And the winner is:
 
  
gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
 
  Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
  again.

 cripes.

 +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr,
 +  char *buf)
 +{
 +   struct kobject *top_kobj;
 +   struct kset *kset;
 +   char *envp[32];
 +   char data[PAGE_SIZE];

 That won't work too well with 4k stacks.


Yeah, sorry.


Wait, even though this isn't good, it shouldn't have been hit by anyone,
that file used to not be readable, so I doubt userspace would have been
trying to read it...

Tilman, what version of HAL and udev do you have on your machine?

Kay, did you get the 'read the uevent file' code already into udev
and/or HAL?


Only udevtest uses this at the moment, but that is only used for debugging.
It's probably the brain-dead libsysfs, which opens and reads every
file in /sys, even when nobody is interested in the data.

Thanks,
Kay
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
Am 02.05.2007 09:52 schrieb Greg KH:
 Tilman, here's a patch, can you try this on top of your tree that dies?

2.6.21-git3 plus that patch comes up fine.

(Except for a UDP problem I seem to remember I already saw reported
on lkml and which I'll ignore for now in order not to blur the
picture.)

Started to git-bisect mainline now, but that will take some time.
It's more than 800 patches to check and I don't get more than 2-3
iterations per day out of that machine.

HTH
T.

 ---
  drivers/base/core.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)
 
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device
   struct kobject *top_kobj;
   struct kset *kset;
   char *envp[32];
 - char data[PAGE_SIZE];
 + char *data = NULL;
   char *pos;
   int i;
   size_t count = 0;
 @@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device
   if (!kset-uevent_ops-filter(kset, dev-kobj))
   goto out;
  
 + data = (char *)get_zeroed_page(GFP_KERNEL);
 + if (!data)
 + return -ENOMEM;
 +
   /* let the kset specific function add its keys */
   pos = data;
   retval = kset-uevent_ops-uevent(kset, dev-kobj,
 @@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device
   count += sprintf(pos, %s\n, envp[i]);
   }
  out:
 + free_page((unsigned long)data);
   return count;
  }
  

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Andrew Morton
On Wed, 02 May 2007 19:36:03 +0200
Tilman Schmidt [EMAIL PROTECTED] wrote:

 Am 02.05.2007 09:52 schrieb Greg KH:
  Tilman, here's a patch, can you try this on top of your tree that dies?
 
 2.6.21-git3 plus that patch comes up fine.
 
 (Except for a UDP problem I seem to remember I already saw reported
 on lkml and which I'll ignore for now in order not to blur the
 picture.)

Thanks.

 Started to git-bisect mainline now, but that will take some time.
 It's more than 800 patches to check and I don't get more than 2-3
 iterations per day out of that machine.

I don't think there's much point in you doing that.  We know what the bug is.

Switching to 8k stacks will probably fix things up too.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Tilman Schmidt
Am 02.05.2007 22:07 schrieb Andrew Morton:
 Started to git-bisect mainline now, but that will take some time.
[...]
 I don't think there's much point in you doing that.  We know what the bug is.

Good. Saves me some work. :-)

If you'd like me to test anything, just let me know.

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-02 Thread Greg KH
On Wed, May 02, 2007 at 11:41:22AM +0200, Tilman Schmidt wrote:
 On Wed, 2 May 2007 00:43:05 -0700, Greg KH [EMAIL PROTECTED] said:
 
And the winner is:

gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch

Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel
again.
  
  Wait, even though this isn't good, it shouldn't have been hit by anyone,
  that file used to not be readable, so I doubt userspace would have been
  trying to read it...
  
  Tilman, what version of HAL and udev do you have on your machine?
 
 The ones that came with SuSE 10.0:
 
 hal-0.5.4-6.4
 udev-068git20050831-9

Ah, ok, that explains it, the really old libsysfs walks and opens all
files in sysfs for some odd, strange, and broken reason.  This has been
fixed in newer versions, and explains why you are seeing this happen.

I'll send my fix for this to Linus in a few hours.

thanks for testing and tracking this down, I really appreciate it.

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-01 Thread Greg KH
On Tue, May 01, 2007 at 01:26:44PM +0200, Tilman Schmidt wrote:
> Am 30.04.2007 21:46 schrieb Andrew Morton:
> > Sure, but what about 2.6.21-git3 (or, better, current -git)?
> 
> 2.6.21-git3 crashed with panic blink at "scanning usb: .."
> (Nothing in the log this time.)

Eeek, that's not good.

Can you keep bisecting Linus's tree?  'git bisect' makes this very easy
to do.  We need to track this down as soon as possible if we can.

> Will continue bisecting -rc7-mm2.

Can you focus on Linus's tree now, as we know that it is the part
causing problems?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken

2007-05-01 Thread Bernhard Walle
Hello Vivek,

* Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]:
> > 
> handle_edge_irq() already makes sure that desc->action is not null, still
> note_interrupt() is receiving desc->action as null, that's strange. On my 
> system this is happening for irq 4 and /proc/interrupt shows that it is
> coming from "serial".

from reading the code I also cannot this. However, I'm trying to
reproduce the problem here. I hope I find a machine where this also 
happens.


Thanks,
   Bernhard
-- 
SUSE LINUX Products GmbH  Tel. +49 (911) 74053-0
Maxfeldstr. 5 GF: Markus Rex
90409 Nürnberg, Germany   HRB 16746 (AG Nürnberg)
OpenPGP DDAF6454: F61F 34CC 09CA FB82 C9F6  BA4B 8865 3696 DDAF 6454
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote:

> On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:
> 
> > On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
> > > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
> > > 
> > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > > > > randomish times (presumably in the timer irq handler) when netconsole 
> > > > > and
> > > > > printk-time are enabled.
> > > > 
> > > > A backtrace would be good. Does nmi_watchdog=2 show anything
> > > > interesting or if not sysrq-t?
> > > 
> > > I can't get anything from sysrq or nmi_watchdog.
> > 
> > Hmm, ok when the console locks up those likely don't work.
> > 
> > > 
> > > > > I was hitting the same thing on i386 uniprocessor, but I thought it 
> > > > > got
> > > > > fixed.
> > > > 
> > > > Yes.
> > > 
> > > Fixed where?  Merged into mainline or in your firstfloor patches?
> > 
> > None of the sched-clock changes are in mainline yet.
> > 
> > Can you perhaps test latest firstfloor alone (without rest of -mm)?
> 
> OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
> applied to 2.6.21-rc7-git5 ?

Applied cleanly to 2.6.21-rc7-git5, but it has build errors:


arch/x86_64/mm/built-in.o: In function `mark_rodata_ro':
(.text+0x180): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `mem_init':
(.init.text+0x2cf): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `do_page_fault':
(.kprobes.text+0x59c): undefined reference to `_stext'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x40): undefined reference to `vdso_end'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x58): undefined reference to `vdso_start'
arch/x86_64/vdso/built-in.o: In function `init_vdso_vars':
vma.c:(.init.text+0x1b): undefined reference to `vdso_end'
vma.c:(.init.text+0x26): undefined reference to `vdso_start'
vma.c:(.init.text+0x3c): undefined reference to `vdso_start'
kernel/built-in.o: In function `profile_hits':
(.text+0x9609): undefined reference to `_stext'
kernel/built-in.o: In function `core_kernel_text':
(.text+0x197c4): undefined reference to `_stext'
kernel/built-in.o: In function `is_ksym_addr':
kallsyms.c:(.text+0x27042): undefined reference to `_stext'
kernel/built-in.o: In function `profile_init':
(.init.text+0xc57): undefined reference to `_stext'
make: *** [.tmp_vmlinux1] Error 1

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:

> On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
> > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
> > 
> > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > > > randomish times (presumably in the timer irq handler) when netconsole 
> > > > and
> > > > printk-time are enabled.
> > > 
> > > A backtrace would be good. Does nmi_watchdog=2 show anything
> > > interesting or if not sysrq-t?
> > 
> > I can't get anything from sysrq or nmi_watchdog.
> 
> Hmm, ok when the console locks up those likely don't work.
> 
> > 
> > > > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > > > fixed.
> > > 
> > > Yes.
> > 
> > Fixed where?  Merged into mainline or in your firstfloor patches?
> 
> None of the sched-clock changes are in mainline yet.
> 
> Can you perhaps test latest firstfloor alone (without rest of -mm)?

OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
applied to 2.6.21-rc7-git5 ?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote:

> On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > The bug is in firstfloor only, and the fix (if present) will be there too.
> > > 
> > > 
> > > 
> > > Nope,
> > > 
> > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
> > > 
> > > is identical to
> > > 
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
> > 
> > Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
> > without CONFIG_CPUFREQ
> > too?
> > 
> > [cpufreq handler calls ktime_get which might take xtime lock for reading] 
> > 
> 
> Sounds right.  That's what was happening to me for a while.
> 
> Randy, it'd be interesting to try:
> 
> --- a/arch/x86_64/kernel/tsc.c~a
> +++ a/arch/x86_64/kernel/tsc.c
> @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
>   cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
>  
>   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> - if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> - mark_tsc_unstable("cpufreq changes");
> +//   if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> +//   mark_tsc_unstable("cpufreq changes");
>   }
>  
>   return 0;
> _

I don't have CPU_FREQ enabled, so that didn't change anything.


> and if that "fixes" it, disable netconsole and do
> 
> --- a/arch/x86_64/kernel/tsc.c~a
> +++ a/arch/x86_64/kernel/tsc.c
> @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
>  
>   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
>   if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> - mark_tsc_unstable("cpufreq changes");
> + dump_stack();
>   }
>  
>   return 0;


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-01 Thread Tilman Schmidt
Am 30.04.2007 21:46 schrieb Andrew Morton:
> Sure, but what about 2.6.21-git3 (or, better, current -git)?

2.6.21-git3 crashed with panic blink at "scanning usb: .."
(Nothing in the log this time.)

Will continue bisecting -rc7-mm2.

HTH
T.

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-01 Thread Tilman Schmidt
Am 30.04.2007 21:46 schrieb Andrew Morton:
 Sure, but what about 2.6.21-git3 (or, better, current -git)?

2.6.21-git3 crashed with panic blink at scanning usb: ..
(Nothing in the log this time.)

Will continue bisecting -rc7-mm2.

HTH
T.

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote:

 On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen [EMAIL PROTECTED] wrote:
 
   The bug is in firstfloor only, and the fix (if present) will be there too.
   
   checks
   
   Nope,
   
   ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
   
   is identical to
   
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
  
  Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
  without CONFIG_CPUFREQ
  too?
  
  [cpufreq handler calls ktime_get which might take xtime lock for reading] 
  
 
 Sounds right.  That's what was happening to me for a while.
 
 Randy, it'd be interesting to try:
 
 --- a/arch/x86_64/kernel/tsc.c~a
 +++ a/arch/x86_64/kernel/tsc.c
 @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
   cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq-new);
  
   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new);
 - if (!(freq-flags  CPUFREQ_CONST_LOOPS))
 - mark_tsc_unstable(cpufreq changes);
 +//   if (!(freq-flags  CPUFREQ_CONST_LOOPS))
 +//   mark_tsc_unstable(cpufreq changes);
   }
  
   return 0;
 _

I don't have CPU_FREQ enabled, so that didn't change anything.


 and if that fixes it, disable netconsole and do
 
 --- a/arch/x86_64/kernel/tsc.c~a
 +++ a/arch/x86_64/kernel/tsc.c
 @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
  
   tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new);
   if (!(freq-flags  CPUFREQ_CONST_LOOPS))
 - mark_tsc_unstable(cpufreq changes);
 + dump_stack();
   }
  
   return 0;


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:

 On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
  On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
  
Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
randomish times (presumably in the timer irq handler) when netconsole 
and
printk-time are enabled.
   
   A backtrace would be good. Does nmi_watchdog=2 show anything
   interesting or if not sysrq-t?
  
  I can't get anything from sysrq or nmi_watchdog.
 
 Hmm, ok when the console locks up those likely don't work.
 
  
I was hitting the same thing on i386 uniprocessor, but I thought it got
fixed.
   
   Yes.
  
  Fixed where?  Merged into mainline or in your firstfloor patches?
 
 None of the sched-clock changes are in mainline yet.
 
 Can you perhaps test latest firstfloor alone (without rest of -mm)?

OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
applied to 2.6.21-rc7-git5 ?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-05-01 Thread Randy Dunlap
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote:

 On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote:
 
  On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
   On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
   
 Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
 randomish times (presumably in the timer irq handler) when netconsole 
 and
 printk-time are enabled.

A backtrace would be good. Does nmi_watchdog=2 show anything
interesting or if not sysrq-t?
   
   I can't get anything from sysrq or nmi_watchdog.
  
  Hmm, ok when the console locks up those likely don't work.
  
   
 I was hitting the same thing on i386 uniprocessor, but I thought it 
 got
 fixed.

Yes.
   
   Fixed where?  Merged into mainline or in your firstfloor patches?
  
  None of the sched-clock changes are in mainline yet.
  
  Can you perhaps test latest firstfloor alone (without rest of -mm)?
 
 OK.  so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or
 applied to 2.6.21-rc7-git5 ?

Applied cleanly to 2.6.21-rc7-git5, but it has build errors:


arch/x86_64/mm/built-in.o: In function `mark_rodata_ro':
(.text+0x180): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `mem_init':
(.init.text+0x2cf): undefined reference to `_stext'
arch/x86_64/mm/built-in.o: In function `do_page_fault':
(.kprobes.text+0x59c): undefined reference to `_stext'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x40): undefined reference to `vdso_end'
arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages':
(.text+0x58): undefined reference to `vdso_start'
arch/x86_64/vdso/built-in.o: In function `init_vdso_vars':
vma.c:(.init.text+0x1b): undefined reference to `vdso_end'
vma.c:(.init.text+0x26): undefined reference to `vdso_start'
vma.c:(.init.text+0x3c): undefined reference to `vdso_start'
kernel/built-in.o: In function `profile_hits':
(.text+0x9609): undefined reference to `_stext'
kernel/built-in.o: In function `core_kernel_text':
(.text+0x197c4): undefined reference to `_stext'
kernel/built-in.o: In function `is_ksym_addr':
kallsyms.c:(.text+0x27042): undefined reference to `_stext'
kernel/built-in.o: In function `profile_init':
(.init.text+0xc57): undefined reference to `_stext'
make: *** [.tmp_vmlinux1] Error 1

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 irqpoll seems to be broken

2007-05-01 Thread Bernhard Walle
Hello Vivek,

* Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]:
  
 handle_edge_irq() already makes sure that desc-action is not null, still
 note_interrupt() is receiving desc-action as null, that's strange. On my 
 system this is happening for irq 4 and /proc/interrupt shows that it is
 coming from serial.

from reading the code I also cannot this. However, I'm trying to
reproduce the problem here. I hope I find a machine where this also 
happens.


Thanks,
   Bernhard
-- 
SUSE LINUX Products GmbH  Tel. +49 (911) 74053-0
Maxfeldstr. 5 GF: Markus Rex
90409 Nürnberg, Germany   HRB 16746 (AG Nürnberg)
OpenPGP DDAF6454: F61F 34CC 09CA FB82 C9F6  BA4B 8865 3696 DDAF 6454
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-05-01 Thread Greg KH
On Tue, May 01, 2007 at 01:26:44PM +0200, Tilman Schmidt wrote:
 Am 30.04.2007 21:46 schrieb Andrew Morton:
  Sure, but what about 2.6.21-git3 (or, better, current -git)?
 
 2.6.21-git3 crashed with panic blink at scanning usb: ..
 (Nothing in the log this time.)

Eeek, that's not good.

Can you keep bisecting Linus's tree?  'git bisect' makes this very easy
to do.  We need to track this down as soon as possible if we can.

 Will continue bisecting -rc7-mm2.

Can you focus on Linus's tree now, as we know that it is the part
causing problems?

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote:

> > The bug is in firstfloor only, and the fix (if present) will be there too.
> > 
> > 
> > 
> > Nope,
> > 
> > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
> > 
> > is identical to
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch
> 
> Or perhaps the deadlock is in the cpufrequency handler. Does it happen 
> without CONFIG_CPUFREQ
> too?
> 
> [cpufreq handler calls ktime_get which might take xtime lock for reading] 
> 

Sounds right.  That's what was happening to me for a while.

Randy, it'd be interesting to try:

--- a/arch/x86_64/kernel/tsc.c~a
+++ a/arch/x86_64/kernel/tsc.c
@@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct 
cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
-   if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable("cpufreq changes");
+// if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+// mark_tsc_unstable("cpufreq changes");
}
 
return 0;
_

and if that "fixes" it, disable netconsole and do

--- a/arch/x86_64/kernel/tsc.c~a
+++ a/arch/x86_64/kernel/tsc.c
@@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable("cpufreq changes");
+   dump_stack();
}
 
return 0;
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
> The bug is in firstfloor only, and the fix (if present) will be there too.
> 
> 
> 
> Nope,
> 
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share
> 
> is identical to
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch

Or perhaps the deadlock is in the cpufrequency handler. Does it happen without 
CONFIG_CPUFREQ
too?

[cpufreq handler calls ktime_get which might take xtime lock for reading] 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 22:16:24 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> > > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > > fixed.
> > 
> > Yes.
> 
> Fixed where?  Merged into mainline or in your firstfloor patches?

The bug is in firstfloor only, and the fix (if present) will be there too.



Nope,

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share

is identical to

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote:
> On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:
> 
> > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > > randomish times (presumably in the timer irq handler) when netconsole and
> > > printk-time are enabled.
> > 
> > A backtrace would be good. Does nmi_watchdog=2 show anything
> > interesting or if not sysrq-t?
> 
> I can't get anything from sysrq or nmi_watchdog.

Hmm, ok when the console locks up those likely don't work.

> 
> > > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > > fixed.
> > 
> > Yes.
> 
> Fixed where?  Merged into mainline or in your firstfloor patches?

None of the sched-clock changes are in mainline yet.

Can you perhaps test latest firstfloor alone (without rest of -mm)?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap
On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote:

> > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> > randomish times (presumably in the timer irq handler) when netconsole and
> > printk-time are enabled.
> 
> A backtrace would be good. Does nmi_watchdog=2 show anything
> interesting or if not sysrq-t?

I can't get anything from sysrq or nmi_watchdog.

> > I was hitting the same thing on i386 uniprocessor, but I thought it got
> > fixed.
> 
> Yes.

Fixed where?  Merged into mainline or in your firstfloor patches?

> My current sched_clock does not take any locks anymore and it was removed
> from the cpufreq handler too.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andi Kleen
> Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
> randomish times (presumably in the timer irq handler) when netconsole and
> printk-time are enabled.

A backtrace would be good. Does nmi_watchdog=2 show anything
interesting or if not sysrq-t?

> 
> I was hitting the same thing on i386 uniprocessor, but I thought it got
> fixed.

Yes.

My current sched_clock does not take any locks anymore and it was removed
from the cpufreq handler too.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 17:45:55 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Mon, 30 Apr 2007 16:51:01 -0700
> > Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > 
> >> On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:
> >>
> >>> On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
> >>>
> >>>> On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> 
> >>>> wrote:
> >>>>
> >>>>> On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> >>>>>
> >>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> >>>>> I'm getting a hang near the end of booting on x86_64 UP.
> >>>>> The last initcall_debug function varies.  E.g.:
> >>>>>
> >>>>> 1/
> >>>>> [0.140257] Calling initcall 0x806f2fa8: 
> >>>>> init_misc_binfmt+0x0/0x3f()
> >>>>> [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> >>>>> returned 0.
> >>>>> [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> >>>>> init_misc_binfmt+0x0/0x3f()
> >>>>> [0.140284] Calling initcall 0x806f2fe7: 
> >>>>> init_script_binfmt+0x0/0x12()
> >>>>> [0.140293] initcall 0x806f2fe7: 
> >>>>> init_script_binfmt+0x0/0x12() returned 0.
> >>>>> [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> >>>>> init_script_binfmt+0x0/0x12()
> >>>>> [0.140310] Calling initcall 0x806f2ff9: 
> >>>>> init_elf_binfmt+0x0/0x12()
> >>>>> [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> >>>>> returned 0.
> >>>>> [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> >>>>> init_elf_binfmt+0x0/0x12()
> >>>>> [0.140335] Calling initcall 0x806f3de9: 
> >>>>> debugfs_init+0x0/0x4a()
> >>>>> [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> >>>>> returned 0.
> >>>>> [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> >>>>> debugfs_init+0x0/0x4a()
> >>>>>
> >>>>> 2/
> >>>>> [0.140206] Calling initcall 0x806efeb1: 
> >>>>> ksysfs_init+0x0/0x29()
> >>>>> [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
> >>>>> returned 0.
> >>>>> [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> >>>>> ksysfs_init+0x0/0x29()
> >>>>> [0.140230] Calling initcall 0x806f25be: 
> >>>>> filelock_init+0x0/0x31()
> >>>>> [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> >>>>> returned 0.
> >>>>> [0.140249] initcall 0x806f25be ran for 0 msecs: 
> >>>>> filelock_init+0x0/0x31()
> >>>>> [0.140258] Calling initcall 0x806f2fa8: 
> >>>>> init_misc_binfmt+0x0/0x3f()
> >>>>> [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> >>>>> returned 0.
> >>>>> [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> >>>>> init_misc_binfmt+0x0/0x3f()
> >>>>> [0.140284] Calling initcall 0x806f2fe7: 
> >>>>> init_script_binfmt+0x0/0x12()
> >>>>> [0.140293] initcall 0x806f2fe7: 
> >>>>> init_script_binfmt+0x0/0x12() returned 0.
> >>>>>
> >>>> So perhaps it locks during a timer interrupt.
> >>>>
> >>>>> .config is attached.
> >>>>>
> >>>>> Any ideas/suggestions?
> >>>> Just the usual: nothing from sysrq or NMI watchdog?
> >>> Nothing from either of those.  I'll jiggle some config options.
> >> config option changes didn't help, but removing
> >>netconsole=
> >> from the kernel command line makes it all happy.  :(
> > 
> > argh.
> > 
> >> Do we know of netconsole hang problems?  (anyone?)
> > 
> > You have "time" as well?  I found on i386 uniproc that time+netconsole
> > caused hangs because the printk timestamping code was taking
> > xtime_lock for reading inside a write_seqlock.  But I though that Andi
> > fixed that.  Perhaps i386 got fixed but x86_64 did not.
> 
> Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot.  Thanks.
> 
> Maybe the patch isn't merged yet?

Could be.  I don't recall whether Andi's statement was before or after
2.6.21-rc7-mm2 actually.

> Now if I can just remember this until the next time that I hit it...

Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at
randomish times (presumably in the timer irq handler) when netconsole and
printk-time are enabled.

I was hitting the same thing on i386 uniprocessor, but I thought it got
fixed.

The problem was that the printable string which is newly passed to
mark_tsc_unstable() is printed out inside write_seqlock(xtime_lock) but
printk timestamping (and perhaps netconsole tx?) want to take xtime_lock
for reading, which will hang.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap

Andrew Morton wrote:

On Mon, 30 Apr 2007 16:51:01 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:


On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:


On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:


On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:


On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/

I'm getting a hang near the end of booting on x86_64 UP.
The last initcall_debug function varies.  E.g.:

1/
[0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
returned 0.
[0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
init_script_binfmt+0x0/0x12()
[0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12()
[0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 
0.
[0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
init_elf_binfmt+0x0/0x12()
[0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a()
[0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0.
[0.140351] initcall 0x806f3de9 ran for 0 msecs: 
debugfs_init+0x0/0x4a()

2/
[0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
[0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0.
[0.140222] initcall 0x806efeb1 ran for 0 msecs: 
ksysfs_init+0x0/0x29()
[0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31()
[0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0.
[0.140249] initcall 0x806f25be ran for 0 msecs: 
filelock_init+0x0/0x31()
[0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f()
[0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
returned 0.
[0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
init_misc_binfmt+0x0/0x3f()
[0.140284] Calling initcall 0x806f2fe7: 
init_script_binfmt+0x0/0x12()
[0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
returned 0.


So perhaps it locks during a timer interrupt.


.config is attached.

Any ideas/suggestions?

Just the usual: nothing from sysrq or NMI watchdog?

Nothing from either of those.  I'll jiggle some config options.

config option changes didn't help, but removing
netconsole=
from the kernel command line makes it all happy.  :(


argh.


Do we know of netconsole hang problems?  (anyone?)


You have "time" as well?  I found on i386 uniproc that time+netconsole
caused hangs because the printk timestamping code was taking
xtime_lock for reading inside a write_seqlock.  But I though that Andi
fixed that.  Perhaps i386 got fixed but x86_64 did not.


Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot.  Thanks.

Maybe the patch isn't merged yet?

Now if I can just remember this until the next time that I hit it...

--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 16:51:01 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:
> 
> > On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
> > 
> > > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> > > > 
> > > > > 
> > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > > > 
> > > > I'm getting a hang near the end of booting on x86_64 UP.
> > > > The last initcall_debug function varies.  E.g.:
> > > > 
> > > > 1/
> > > > [0.140257] Calling initcall 0x806f2fa8: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > > returned 0.
> > > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140284] Calling initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12()
> > > > [0.140293] initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12() returned 0.
> > > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> > > > init_script_binfmt+0x0/0x12()
> > > > [0.140310] Calling initcall 0x806f2ff9: 
> > > > init_elf_binfmt+0x0/0x12()
> > > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> > > > returned 0.
> > > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> > > > init_elf_binfmt+0x0/0x12()
> > > > [0.140335] Calling initcall 0x806f3de9: 
> > > > debugfs_init+0x0/0x4a()
> > > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> > > > returned 0.
> > > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> > > > debugfs_init+0x0/0x4a()
> > > > 
> > > > 2/
> > > > [0.140206] Calling initcall 0x806efeb1: 
> > > > ksysfs_init+0x0/0x29()
> > > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
> > > > returned 0.
> > > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> > > > ksysfs_init+0x0/0x29()
> > > > [0.140230] Calling initcall 0x806f25be: 
> > > > filelock_init+0x0/0x31()
> > > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> > > > returned 0.
> > > > [0.140249] initcall 0x806f25be ran for 0 msecs: 
> > > > filelock_init+0x0/0x31()
> > > > [0.140258] Calling initcall 0x806f2fa8: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > > returned 0.
> > > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> > > > init_misc_binfmt+0x0/0x3f()
> > > > [0.140284] Calling initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12()
> > > > [0.140293] initcall 0x806f2fe7: 
> > > > init_script_binfmt+0x0/0x12() returned 0.
> > > > 
> > > 
> > > So perhaps it locks during a timer interrupt.
> > > 
> > > > .config is attached.
> > > > 
> > > > Any ideas/suggestions?
> > > 
> > > Just the usual: nothing from sysrq or NMI watchdog?
> > 
> > Nothing from either of those.  I'll jiggle some config options.
> 
> config option changes didn't help, but removing
>   netconsole=
> from the kernel command line makes it all happy.  :(

argh.

> Do we know of netconsole hang problems?  (anyone?)

You have "time" as well?  I found on i386 uniproc that time+netconsole
caused hangs because the printk timestamping code was taking
xtime_lock for reading inside a write_seqlock.  But I though that Andi
fixed that.  Perhaps i386 got fixed but x86_64 did not.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)

2007-04-30 Thread Randy Dunlap
On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote:

> On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:
> 
> > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> > > 
> > > > 
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > > 
> > > I'm getting a hang near the end of booting on x86_64 UP.
> > > The last initcall_debug function varies.  E.g.:
> > > 
> > > 1/
> > > [0.140257] Calling initcall 0x806f2fa8: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > returned 0.
> > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140284] Calling initcall 0x806f2fe7: 
> > > init_script_binfmt+0x0/0x12()
> > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
> > > returned 0.
> > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> > > init_script_binfmt+0x0/0x12()
> > > [0.140310] Calling initcall 0x806f2ff9: 
> > > init_elf_binfmt+0x0/0x12()
> > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> > > returned 0.
> > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> > > init_elf_binfmt+0x0/0x12()
> > > [0.140335] Calling initcall 0x806f3de9: 
> > > debugfs_init+0x0/0x4a()
> > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> > > returned 0.
> > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> > > debugfs_init+0x0/0x4a()
> > > 
> > > 2/
> > > [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
> > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() 
> > > returned 0.
> > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> > > ksysfs_init+0x0/0x29()
> > > [0.140230] Calling initcall 0x806f25be: 
> > > filelock_init+0x0/0x31()
> > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> > > returned 0.
> > > [0.140249] initcall 0x806f25be ran for 0 msecs: 
> > > filelock_init+0x0/0x31()
> > > [0.140258] Calling initcall 0x806f2fa8: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > > returned 0.
> > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> > > init_misc_binfmt+0x0/0x3f()
> > > [0.140284] Calling initcall 0x806f2fe7: 
> > > init_script_binfmt+0x0/0x12()
> > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
> > > returned 0.
> > > 
> > 
> > So perhaps it locks during a timer interrupt.
> > 
> > > .config is attached.
> > > 
> > > Any ideas/suggestions?
> > 
> > Just the usual: nothing from sysrq or NMI watchdog?
> 
> Nothing from either of those.  I'll jiggle some config options.

config option changes didn't help, but removing
netconsole=
from the kernel command line makes it all happy.  :(

Do we know of netconsole hang problems?  (anyone?)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Rafael J. Wysocki
On Monday, 30 April 2007 22:52, Dan Kruchinin wrote:
> On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > [Please don't drop addresses from the CC list]
> >
> > On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote:
> > > On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > > Hi,
> > > >
> > > > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
> > > > > Hi all.
> > > > >
> > > > > There is a problem on my macbook core duo with suspend.
> > > > > after suspending when i'm trying to 'wake up' my notebook, it seems
> > > > > that it works, but i don't see anything at my monitor. So i have to
> > > > > reboot it to continue my work.
> > > >
> > > > What exactly do you do to suspend?
> > > >
> > > > Rafael
> > > >
> > > >
> > > > > ---
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
> > > > > kernel/kthread.c:166 kthread_bind()
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900870]  []
> > > > > _cpu_down+0x16b/0x250
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900893]  []
> > > > > disable_nonboot_cpus+0x60/0xf0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900903]  []
> > > > > enter_state+0x22a/0x240
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900913]  []
> > > > > state_store+0xbd/0xd0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900920]  []
> > > > > state_store+0x0/0xd0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900927]  []
> > > > > subsys_attr_store+0x29/0x40
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900937]  []
> > > > > sysfs_write_file+0xd4/0x160
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900948]  []
> > > > > vfs_write+0xa6/0x160
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900958]  []
> > > > > sysfs_write_file+0x0/0x160
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900966]  []
> > > > > sys_write+0x41/0x70
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900974]  []
> > > > > sys_dup2+0xeb/0x120
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900984]  []
> > > > > sysenter_past_esp+0x5f/0x85
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900999]  
> > > > > ===
> > > > > ---
> > > > >
> > > > > dmesg output:
> > > > > 
> > > > > 
> > > > > Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device 
> > > > > vtcon0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device 
> > > > > platform
> > > > > Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot 
> > > > > CPUs ...
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives:
> > > > > switching to UP code
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
> > > > > kernel/kthread.c:166 kthread_bind()
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900870]  []
> > > > > _cpu_down+0x16b/0x250
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900893]  []
> > > > > disable_nonboot_cpus+0x60/0xf0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900903]  []
> > > > > enter_state+0x22a/0x240
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900913]  []
> > > > > state_store+0xbd/0xd0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900920]  []
> > > > > state_store+0x0/0xd0
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900927]  []
> > > > > subsys_attr_store+0x29/0x40
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900937]  []
> > > > > sysfs_write_file+0xd4/0x160
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900948]  []
> > > > > vfs_write+0xa6/0x160
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900958]  []
> > > > > sysfs_write_file+0x0/0x160
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900966]  []
> > > > > sys_write+0x41/0x70
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900974]  []
> > > > > sys_dup2+0xeb/0x120
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900984]  []
> > > > > sysenter_past_esp+0x5f/0x85
> > > > > Apr 29 23:31:16 midgard kernel: [140594.900999]  
> > > > > ===
> > > > > Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot 
> > > > > CPUs ...
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives:
> > > > > switching to SMP code
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 
> > > > > 1/1 eip 3000
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay
> > > > > using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390)
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait 
> > > > > feature present.
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache:
> > > > > 32K, L1 D cache: 32K
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K
> > > > > Apr 29 23:31:16 midgard kernel: [18014366.509160] 

Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Tilman Schmidt
Am 30.04.2007 21:46 schrieb Andrew Morton:

>> 2.6.21-final is fine.
> 
> Sure, but what about 2.6.21-git3 (or, better, current -git)?

OIC. Sorry for being dense. Will check.

>>>  If that's OK then we need to pick through the difference between
>>> 2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
>>> that's a pretty small set.
>> I'm not quite sure how to determine that difference. Can you just provide
>> me with a list of patches you'd like me to test?
> 
> Not really - everything's tangled up.  A bisection search on the
> 2.6.21-rc7-mm2 driver tree would be the best bet.

Ok. No prob. It'll just take a bit of time. (Compiling a kernel on
that machine takes about 4 hours.)

I'll be back. :-)

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Dan Kruchinin

On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

[Please don't drop addresses from the CC list]

On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote:
> On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
> > > Hi all.
> > >
> > > There is a problem on my macbook core duo with suspend.
> > > after suspending when i'm trying to 'wake up' my notebook, it seems
> > > that it works, but i don't see anything at my monitor. So i have to
> > > reboot it to continue my work.
> >
> > What exactly do you do to suspend?
> >
> > Rafael
> >
> >
> > > ---
> > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
> > > kernel/kthread.c:166 kthread_bind()
> > > Apr 29 23:31:16 midgard kernel: [140594.900870]  []
> > > _cpu_down+0x16b/0x250
> > > Apr 29 23:31:16 midgard kernel: [140594.900893]  []
> > > disable_nonboot_cpus+0x60/0xf0
> > > Apr 29 23:31:16 midgard kernel: [140594.900903]  []
> > > enter_state+0x22a/0x240
> > > Apr 29 23:31:16 midgard kernel: [140594.900913]  []
> > > state_store+0xbd/0xd0
> > > Apr 29 23:31:16 midgard kernel: [140594.900920]  []
> > > state_store+0x0/0xd0
> > > Apr 29 23:31:16 midgard kernel: [140594.900927]  []
> > > subsys_attr_store+0x29/0x40
> > > Apr 29 23:31:16 midgard kernel: [140594.900937]  []
> > > sysfs_write_file+0xd4/0x160
> > > Apr 29 23:31:16 midgard kernel: [140594.900948]  []
> > > vfs_write+0xa6/0x160
> > > Apr 29 23:31:16 midgard kernel: [140594.900958]  []
> > > sysfs_write_file+0x0/0x160
> > > Apr 29 23:31:16 midgard kernel: [140594.900966]  []
> > > sys_write+0x41/0x70
> > > Apr 29 23:31:16 midgard kernel: [140594.900974]  []
> > > sys_dup2+0xeb/0x120
> > > Apr 29 23:31:16 midgard kernel: [140594.900984]  []
> > > sysenter_past_esp+0x5f/0x85
> > > Apr 29 23:31:16 midgard kernel: [140594.900999]  ===
> > > ---
> > >
> > > dmesg output:
> > > 
> > > 
> > > Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device vtcon0
> > > Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device platform
> > > Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot CPUs 
...
> > > Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline
> > > Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives:
> > > switching to UP code
> > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
> > > kernel/kthread.c:166 kthread_bind()
> > > Apr 29 23:31:16 midgard kernel: [140594.900870]  []
> > > _cpu_down+0x16b/0x250
> > > Apr 29 23:31:16 midgard kernel: [140594.900893]  []
> > > disable_nonboot_cpus+0x60/0xf0
> > > Apr 29 23:31:16 midgard kernel: [140594.900903]  []
> > > enter_state+0x22a/0x240
> > > Apr 29 23:31:16 midgard kernel: [140594.900913]  []
> > > state_store+0xbd/0xd0
> > > Apr 29 23:31:16 midgard kernel: [140594.900920]  []
> > > state_store+0x0/0xd0
> > > Apr 29 23:31:16 midgard kernel: [140594.900927]  []
> > > subsys_attr_store+0x29/0x40
> > > Apr 29 23:31:16 midgard kernel: [140594.900937]  []
> > > sysfs_write_file+0xd4/0x160
> > > Apr 29 23:31:16 midgard kernel: [140594.900948]  []
> > > vfs_write+0xa6/0x160
> > > Apr 29 23:31:16 midgard kernel: [140594.900958]  []
> > > sysfs_write_file+0x0/0x160
> > > Apr 29 23:31:16 midgard kernel: [140594.900966]  []
> > > sys_write+0x41/0x70
> > > Apr 29 23:31:16 midgard kernel: [140594.900974]  []
> > > sys_dup2+0xeb/0x120
> > > Apr 29 23:31:16 midgard kernel: [140594.900984]  []
> > > sysenter_past_esp+0x5f/0x85
> > > Apr 29 23:31:16 midgard kernel: [140594.900999]  ===
> > > Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down
> > > Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot CPUs 
...
> > > Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives:
> > > switching to SMP code
> > > Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 1/1 
eip 3000
> > > Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1
> > > Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay
> > > using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390)
> > > Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait feature 
present.
> > > Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache:
> > > 32K, L1 D cache: 32K
> > > Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K
> > > Apr 29 23:31:16 midgard kernel: [18014366.509160] CPU: Physical Processor 
ID: 0
> > > Apr 29 23:31:16 midgard kernel: [18014366.509161] CPU: Processor Core ID: 
1
> > > Apr 29 23:31:16 midgard kernel: [18014366.509637] CPU1: Intel Genuine
> > > Intel(R) CPU1500  @ 2.00GHz stepping 08
> > > Apr 29 23:31:16 midgard kernel: [18014366.509659] checking TSC
> > > synchronization [CPU#0 -> CPU#1]:
> > > Apr 29 23:31:16 midgard kernel: [18014366.529627] Measured 68812018716
> > > cycles TSC warp between CPUs, turning off TSC clock.
> > > Apr 29 23:31:16 midgard 

Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 21:28:06 +0200
Tilman Schmidt <[EMAIL PROTECTED]> wrote:

> Am 30.04.2007 20:21 schrieb Andrew Morton:
> > A lot of Greg's driver tree has gone upstream, so please check current
> > mainline.
> 
> 2.6.21-final is fine.

Sure, but what about 2.6.21-git3 (or, better, current -git)?

> >  If that's OK then we need to pick through the difference between
> > 2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
> > that's a pretty small set.
> 
> I'm not quite sure how to determine that difference. Can you just provide
> me with a list of patches you'd like me to test?

Not really - everything's tangled up.  A bisection search on the
2.6.21-rc7-mm2 driver tree would be the best bet.

See, 2.6.21-rc7-mm2 had:

gregkh-driver-driver-core-fix-device_add-error-path.patch
gregkh-driver-driver-core-fix-namespace-issue-with-devices-assigned-to-classes.patch
gregkh-driver-dev_printk-and-new-style-class-devices.patch
gregkh-driver-driver-core-udev-triggered-device-driver-binding.patch
gregkh-driver-driver-core-use-attribute-groups-in-struct-device_type.patch
gregkh-driver-named-device_type.patch
gregkh-driver-kobject-kobject_shadow_add-cleanup.patch
gregkh-driver-driver-core-per-subsystem-multithreaded-probing.patch
gregkh-driver-powerpc-make-it-compile-for-multithread-change.patch
gregkh-driver-driver-core-don-t-fail-attaching-the-device-if-it-cannot-be-bound.patch
gregkh-driver-driver-no-more-wait.patch
gregkh-driver-kref-fix-cpu-ordering-with-respect-to-krefs.patch
gregkh-driver-driver-core-notify-userspace-of-network-device-renames.patch
gregkh-driver-driver-core-suppress-uevents-via-filter.patch
gregkh-driver-driver-core-switch-firmware_class-to-uevent_suppress.patch
gregkh-driver-uevent-use-add_uevent_var-instead-of-open-coding-it.patch
gregkh-driver-driver-core-add-suspend-and-resume-to-struct-device_type.patch
gregkh-driver-kobject-kobject_ueventc-collapse-unnecessary-loop-nesting.patch
gregkh-driver-kobject-kobject_add-reference-leak.patch
gregkh-driver-devices_subsys-rwsem-removal.patch
gregkh-driver-scsi-hosts-rwsem-removal.patch
gregkh-driver-usb-bus-mutex.patch
gregkh-driver-pnp-remove-rwsem-usage.patch
gregkh-driver-input-serio-do-not-touch-bus-s-rwsem.patch
gregkh-driver-input-gameport-do-not-touch-bus-s-rwsem.patch
gregkh-driver-ide-proc-remove-rwsem.patch
gregkh-driver-ieee1394-rwsem-removal.patch
gregkh-driver-phy-rwsem-removal.patch
gregkh-driver-qeth-remove-usage-of-subsys_rwsem.patch
gregkh-driver-subsys-rwsem-removal.patch
gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch
gregkh-driver-sysfs-fix-error-handling-in-binattr-write.patch
gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc.patch
gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir.patch
gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions.patch
gregkh-driver-sysfs-add-sysfs_dirent-s_parent.patch
gregkh-driver-sysfs-add-sysfs_dirent-s_name.patch
gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union.patch
gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock.patch
gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree.patch
gregkh-driver-sysfs-implement-bin_buffer.patch
gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect.patch
gregkh-driver-sysfs-kill-attribute-file-orphaning.patch
gregkh-driver-sysfs-kill-unnecessary-attribute-owner.patch
gregkh-driver-sysfs-make-lockdep-ignore-s_active.patch
gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd.patch
gregkh-driver-sysfs-rename-object_depth-to-sysfs_path_depth-and-make-it-global.patch
gregkh-driver-sysfs-reimplement-sysfs_drop_dentry.patch
gregkh-driver-sysfs-kill-sysfs_dirent-s_dentry.patch
gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
gregkh-driver-driver-core-warn-for-odd-store-uevent-usage.patch
gregkh-driver-kobject-comment-and-warning-fixes-to-kobjectc.patch
gregkh-driver-the-overdue-removal-of-the-mount-umount-uevents.patch
gregkh-driver-debugfs-add-debugfs_create_u64.patch
gregkh-driver-bus_add_driver-return-error-for-no-bus.patch
gregkh-driver-uio.patch
gregkh-driver-uio-documentation.patch
gregkh-driver-uio-dummy.patch
gregkh-driver-uio-hilscher-cif-card-driver.patch
gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch
gregkh-driver-put_device-might_sleep.patch
gregkh-driver-kobject-warn.patch
gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch
gregkh-driver-nozomi.patch


and Greg's driver tree (as of yesterday, I think) had

gregkh-driver-uio.patch
gregkh-driver-uio-documentation.patch
gregkh-driver-uio-dummy.patch
gregkh-driver-uio-hilscher-cif-card-driver.patch
gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch
gregkh-driver-put_device-might_sleep.patch
gregkh-driver-kobject-warn.patch
gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch
gregkh-driver-nozomi.patch

So what has happened (approximately)

Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Tilman Schmidt
Am 30.04.2007 20:21 schrieb Andrew Morton:
> A lot of Greg's driver tree has gone upstream, so please check current
> mainline.

2.6.21-final is fine.

>  If that's OK then we need to pick through the difference between
> 2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
> that's a pretty small set.

I'm not quite sure how to determine that difference. Can you just provide
me with a list of patches you'd like me to test?

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Wehrhausweg 66  Fax: +49 228 4299019
53227 Bonn
Germany



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 19:17:02 +0200
Tilman Schmidt <[EMAIL PROTECTED]> wrote:

> >> With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly
> >> crashes during the SuSE 10.1 startup sequence. When booting to RL5,
> >> it panicblinks shortly after the graphical login screen appears.
> >> Booting to RL3, it hangs after the startup message:
> 
> I have now bisected this down to the section in the series file between
> #GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH
> to the CC list.

This is rather good news.  I was staring at about 200-300 MM patches
wondering which one was buggy.  Thanks heaps for doing the bisect.  Now the
main worry is Randy's dead box.

A lot of Greg's driver tree has gone upstream, so please check current
mainline.  If that's OK then we need to pick through the difference between
2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
that's a pretty small set.

> I'll try bisecting further inside that section (unless
> you tell me not to), but it may take some time.
> 
> The exact point during the startup sequence when the crash occurred and
> the amount of BUG messages produced varied somewhat during these tests.
> The common denominator, and my criterion for the good/bad decisions
> during the bisect, was the crash (panic blink) just before completion
> of the system startup.
> Sometimes there weren't any BUG messages in the log (or perhaps they
> just didn't make it to the disk.) Sometimes I just had a couple of the
> "sleeping function called from invalid context at mm/slab.c:3054"
> ones but no "Eeek! page_mapcount(page) went negative!" one before them.
> However, whenever the "Eeek!" did appear it announced "getcfg-interfac"
> as the current process and was followed by a few of the "mm/slab.c:3054"
> ones.

hm, big mess.  Could be it was some glitch from Tejun's sysfs changes which
are all being extensively redone, so perhaps we'll never hear from it
again.  Or perhaps we just merged it into mainline.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Tilman Schmidt
>> With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly
>> crashes during the SuSE 10.1 startup sequence. When booting to RL5,
>> it panicblinks shortly after the graphical login screen appears.
>> Booting to RL3, it hangs after the startup message:

I have now bisected this down to the section in the series file between
#GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH
to the CC list. I'll try bisecting further inside that section (unless
you tell me not to), but it may take some time.

The exact point during the startup sequence when the crash occurred and
the amount of BUG messages produced varied somewhat during these tests.
The common denominator, and my criterion for the good/bad decisions
during the bisect, was the crash (panic blink) just before completion
of the system startup.
Sometimes there weren't any BUG messages in the log (or perhaps they
just didn't make it to the disk.) Sometimes I just had a couple of the
"sleeping function called from invalid context at mm/slab.c:3054"
ones but no "Eeek! page_mapcount(page) went negative!" one before them.
However, whenever the "Eeek!" did appear it announced "getcfg-interfac"
as the current process and was followed by a few of the "mm/slab.c:3054"
ones.

HTH
Tilman

-- 
In the long run, we'll all be dead.



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 hangs in boot

2007-04-30 Thread Randy Dunlap
On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:

> On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
> 
> > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
> > 
> > > 
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > 
> > I'm getting a hang near the end of booting on x86_64 UP.
> > The last initcall_debug function varies.  E.g.:
> > 
> > 1/
> > [0.140257] Calling initcall 0x806f2fa8: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > returned 0.
> > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140284] Calling initcall 0x806f2fe7: 
> > init_script_binfmt+0x0/0x12()
> > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
> > returned 0.
> > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
> > init_script_binfmt+0x0/0x12()
> > [0.140310] Calling initcall 0x806f2ff9: 
> > init_elf_binfmt+0x0/0x12()
> > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
> > returned 0.
> > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
> > init_elf_binfmt+0x0/0x12()
> > [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a()
> > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
> > returned 0.
> > [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
> > debugfs_init+0x0/0x4a()
> > 
> > 2/
> > [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
> > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 
> > 0.
> > [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
> > ksysfs_init+0x0/0x29()
> > [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31()
> > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
> > returned 0.
> > [0.140249] initcall 0x806f25be ran for 0 msecs: 
> > filelock_init+0x0/0x31()
> > [0.140258] Calling initcall 0x806f2fa8: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
> > returned 0.
> > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
> > init_misc_binfmt+0x0/0x3f()
> > [0.140284] Calling initcall 0x806f2fe7: 
> > init_script_binfmt+0x0/0x12()
> > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
> > returned 0.
> > 
> 
> So perhaps it locks during a timer interrupt.
> 
> > .config is attached.
> > 
> > Any ideas/suggestions?
> 
> Just the usual: nothing from sysrq or NMI watchdog?

Nothing from either of those.  I'll jiggle some config options.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Rafael J. Wysocki
On Monday, 30 April 2007 12:05, Gautham R Shenoy wrote:
> On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote:
> > On Sun, 29 Apr 2007 22:27:44 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
> > > > Hi all.
> > > > 
> > > > There is a problem on my macbook core duo with suspend.
> > > > after suspending when i'm trying to 'wake up' my notebook, it seems
> > > > that it works, but i don't see anything at my monitor. So i have to
> > > > reboot it to continue my work.
> > > 
> > > What exactly do you do to suspend?
> > > 
> > 
> > This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING.
> 
> The state should be TASK_INTERRUPTIBLE. That's the state of the thread
> 'p' should be in when we do a kthread_bind(p) in _cpu_down().
> 
> Are you sure about the TASK_RUNNING part ?

Well, the WARN_ON() in kernel/kthread.c, line166, is triggering here, so it
may be TASK_INTERRUPTIBLE too (should the WARN_ON() trigger in that case)?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Gautham R Shenoy
On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote:
> On Sun, 29 Apr 2007 22:27:44 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
> wrote:
> 
> > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
> > > Hi all.
> > > 
> > > There is a problem on my macbook core duo with suspend.
> > > after suspending when i'm trying to 'wake up' my notebook, it seems
> > > that it works, but i don't see anything at my monitor. So i have to
> > > reboot it to continue my work.
> > 
> > What exactly do you do to suspend?
> > 
> 
> This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING.

The state should be TASK_INTERRUPTIBLE. That's the state of the thread
'p' should be in when we do a kthread_bind(p) in _cpu_down().

Are you sure about the TASK_RUNNING part ?

> 
> So I was sent the below, including worrisome changelog.
> 

Ok, it should not be that worrisome!
By the time we would be doing kthread_stop(p) in _cpu_down(), 'p' would have
been moved over to some other online cpu, due to the migrate_dead_tasks() 
called in CPU_DEAD handling of migration_call (kernel/sched.c).

So we are safe. Anyway, I apologise for causing any worry :-)

Thanks and Regards
gautham.
> 
> 
> 
> From: Gautham R Shenoy <[EMAIL PROTECTED]>
> 
> We are anyway kthread_stop()ping other per-cpu kernel threads after
> move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread
> as well.
> 
> I just checked with Vatsa if there was any subtle reason why they
> had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect
> any and I can't see any. So let us just remove the kthread_bind.
> 
> Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
> Cc: Oleg Nesterov <[EMAIL PROTECTED]>
> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
> Cc: "Rafael J. Wysocki" <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  kernel/cpu.c |4 
>  1 files changed, 4 deletions(-)
> 
> diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c
> --- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down
> +++ a/kernel/cpu.c
> @@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu)
>   /* This actually kills the CPU. */
>   __cpu_die(cpu);
> 
> - /* Move it here so it can run. */
> - kthread_bind(p, get_cpu());
> - put_cpu();
> -
>   /* CPU is completely dead: tell everyone.  Too late to complain. */
>   if (raw_notifier_call_chain(_chain, CPU_DEAD, hcpu) == NOTIFY_BAD)
>   BUG();
> _
> 
> ___
> linux-pm mailing list
> [EMAIL PROTECTED]
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken

2007-04-30 Thread Vivek Goyal
On Thu, Apr 26, 2007 at 08:24:05AM -0700, Andrew Morton wrote:
> On Thu, 26 Apr 2007 15:06:20 +0530 Vivek Goyal <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > I am booting 2.6.21-rc7-mm2 on x86_64 box with "irqpoll" command line option
> > and it panics. I can reproduce this problem easily on this box. Please
> > let me know if serial console output is required.
> > 
> > 2.6.21-rc7 works just fine. So problem seems to be in some -mm patch.
> > 
> > Unable to handle kernel NULL pointer dereference at 0009 RIP:
> >  [] note_interrupt+0x5d/0x21b
> > PGD 1032c5067 PUD 1032c4067 PMD 0
> > Oops: 0000 [1] SMP
> > CPU 1
> > Modules linked in:
> > Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1
> > RIP: 0010:[]  [] 
> > note_interrupt+0x5d/0x21b
> > RSP: 0018:810100cbff08  EFLAGS: 00010002
> > RAX:  RBX: 807e2d40 RCX: 
> > RDX:  RSI: 807e2d40 RDI: 0004
> > RBP: 807e2d40 R08:  R09: 
> > R10: 0010 R11: 00a0 R12: 810104192f40
> > R13: 807e2d84 R14:  R15: 
> > FS:  () GS:810100854140() knlGS:
> > CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> > CR2: 0009 CR3: 0001032c6000 CR4: 06e0
> > Process swapper (pid: 0, threadinfo 810100cb8000, task 810100cb7500)
> > Stack:  0004 0004  807e2d40
> >  0004 810104192f40 807e2d84 
> >   8025c7c5 810100cb9e98 810100cb9e98
> > Call Trace:
> >  [] handle_edge_irq+0xf9/0x127
> >  [] do_IRQ+0xf1/0x160
> >  [] ret_from_intr+0x0/0xa
> >  [] mwait_idle+0x42/0x45
> >  [] cpu_idle+0xbd/0xe0
> > 
> > 
> > Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b
> > RIP  [] note_interrupt+0x5d/0x21b
> >  RSP 
> > CR2: 0009
> > Kernel panic - not syncing: Aiee, killing interrupt handler!
> > 
> 
> hm.  I'd be suspecting
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch
> and
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch
> 
> But because x86_64 doesn't implement IRQ_PER_CPU it's hard to see how we
> got into note_interrupt as a result of that patch.
> 
> Adding the `noirqdebug' boot option would be interesting, perhaps.

"noirqdebug" gets rid of the problem. But that also effectively nullifies
"irqpoll" parameter.

Interestingly on another x86_64 machine this problem does not occur. So
something is dependent on hardware.

I put some debug statements on note_interrupt() and found that desc->action
is a NULL pointer and that's why the problem. Above patch acesses
desc->action->flags, hence it ends up accessing a NULL pointer.

handle_edge_irq() already makes sure that desc->action is not null, still
note_interrupt() is receiving desc->action as null, that's strange. On my 
system this is happening for irq 4 and /proc/interrupt shows that it is
coming from "serial".

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-rc7-mm2] BUG while suspend to ram

2007-04-30 Thread Maciej Rutecki
Maciej Rutecki pisze:
> BUG: at kernel/kthread.c:166 kthread_bind()
>  [] _cpu_down+0x16c/0x250
>  [] disable_nonboot_cpus+0x60/0xf0
>  [] pm_suspend_disk+0x177/0x2c0
>  [] enter_state+0xb5/0x200
>  [] state_store+0xbd/0xd0
>  [] state_store+0x0/0xd0
>  [] subsys_attr_store+0x29/0x40
>  [] sysfs_write_file+0xd4/0x160
>  [] vfs_write+0xc1/0x160
>  [] sysfs_write_file+0x0/0x160
>  [] sys_write+0x41/0x70
>  [] sys_dup2+0xd5/0x100
>  [] sysenter_past_esp+0x5f/0x85
>  [] xfrm_policy_insert+0x210/0x400
>  ===
> 
> dmesg:
> http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/dmesg.txt.gz
> lsmod:
> http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lsmod.txt.gz
> ver_linux:
> http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/ver_linux.txt.gz
> lspci:
> http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lspci.txt.gz
> config:
> http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/config-2.6.21-rc7-mm2.gz
> 

I use this script:

http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/suspend_to_disk.sh

-- 
Maciej Rutecki <[EMAIL PROTECTED]>
http://www.maciek.unixy.pl


smime.p7s
Description: S/MIME Cryptographic Signature


Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Andrew Morton
On Sun, 29 Apr 2007 22:27:44 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
wrote:

> On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
> > Hi all.
> > 
> > There is a problem on my macbook core duo with suspend.
> > after suspending when i'm trying to 'wake up' my notebook, it seems
> > that it works, but i don't see anything at my monitor. So i have to
> > reboot it to continue my work.
> 
> What exactly do you do to suspend?
> 

This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING.

So I was sent the below, including worrisome changelog.




From: Gautham R Shenoy <[EMAIL PROTECTED]>

We are anyway kthread_stop()ping other per-cpu kernel threads after
move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread
as well.

I just checked with Vatsa if there was any subtle reason why they
had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect
any and I can't see any. So let us just remove the kthread_bind.

Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
Cc: Oleg Nesterov <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: "Rafael J. Wysocki" <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 kernel/cpu.c |4 
 1 files changed, 4 deletions(-)

diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c
--- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down
+++ a/kernel/cpu.c
@@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu)
/* This actually kills the CPU. */
__cpu_die(cpu);
 
-   /* Move it here so it can run. */
-   kthread_bind(p, get_cpu());
-   put_cpu();
-
/* CPU is completely dead: tell everyone.  Too late to complain. */
if (raw_notifier_call_chain(_chain, CPU_DEAD, hcpu) == NOTIFY_BAD)
BUG();
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-rc7-mm2] BUG while suspend to ram

2007-04-30 Thread Andrew Morton
On Sun, 29 Apr 2007 12:42:43 +0200 Maciej Rutecki <[EMAIL PROTECTED]> wrote:

> BUG: at kernel/kthread.c:166 kthread_bind()
>  [] _cpu_down+0x16c/0x250
>  [] disable_nonboot_cpus+0x60/0xf0
>  [] pm_suspend_disk+0x177/0x2c0
>  [] enter_state+0xb5/0x200
>  [] state_store+0xbd/0xd0
>  [] state_store+0x0/0xd0
>  [] subsys_attr_store+0x29/0x40
>  [] sysfs_write_file+0xd4/0x160
>  [] vfs_write+0xc1/0x160
>  [] sysfs_write_file+0x0/0x160
>  [] sys_write+0x41/0x70
>  [] sys_dup2+0xd5/0x100
>  [] sysenter_past_esp+0x5f/0x85
>  [] xfrm_policy_insert+0x210/0x400
>  ===

yup, thanks - the present plan is to remove the kthread_bind() call from
_cpu_down().  Although we don't appear to fully undersand why we're
removing it, nor why it was added in the first place, which has me worried.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-rc7-mm2] BUG while suspend to ram

2007-04-30 Thread Andrew Morton
On Sun, 29 Apr 2007 12:42:43 +0200 Maciej Rutecki [EMAIL PROTECTED] wrote:

 BUG: at kernel/kthread.c:166 kthread_bind()
  [c01465ac] _cpu_down+0x16c/0x250
  [c0146890] disable_nonboot_cpus+0x60/0xf0
  [c014cd67] pm_suspend_disk+0x177/0x2c0
  [c014b645] enter_state+0xb5/0x200
  [c014b84d] state_store+0xbd/0xd0
  [c014b790] state_store+0x0/0xd0
  [c01be189] subsys_attr_store+0x29/0x40
  [c01be3a4] sysfs_write_file+0xd4/0x160
  [c017b701] vfs_write+0xc1/0x160
  [c01be2d0] sysfs_write_file+0x0/0x160
  [c017be11] sys_write+0x41/0x70
  [c0187355] sys_dup2+0xd5/0x100
  [c01040f6] sysenter_past_esp+0x5f/0x85
  [c033] xfrm_policy_insert+0x210/0x400
  ===

yup, thanks - the present plan is to remove the kthread_bind() call from
_cpu_down().  Although we don't appear to fully undersand why we're
removing it, nor why it was added in the first place, which has me worried.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Andrew Morton
On Sun, 29 Apr 2007 22:27:44 +0200 Rafael J. Wysocki [EMAIL PROTECTED] 
wrote:

 On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
  Hi all.
  
  There is a problem on my macbook core duo with suspend.
  after suspending when i'm trying to 'wake up' my notebook, it seems
  that it works, but i don't see anything at my monitor. So i have to
  reboot it to continue my work.
 
 What exactly do you do to suspend?
 

This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING.

So I was sent the below, including worrisome changelog.




From: Gautham R Shenoy [EMAIL PROTECTED]

We are anyway kthread_stop()ping other per-cpu kernel threads after
move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread
as well.

I just checked with Vatsa if there was any subtle reason why they
had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect
any and I can't see any. So let us just remove the kthread_bind.

Signed-off-by: Gautham R Shenoy [EMAIL PROTECTED]
Cc: Oleg Nesterov [EMAIL PROTECTED]
Cc: Eric W. Biederman [EMAIL PROTECTED]
Cc: Rafael J. Wysocki [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 kernel/cpu.c |4 
 1 files changed, 4 deletions(-)

diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c
--- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down
+++ a/kernel/cpu.c
@@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu)
/* This actually kills the CPU. */
__cpu_die(cpu);
 
-   /* Move it here so it can run. */
-   kthread_bind(p, get_cpu());
-   put_cpu();
-
/* CPU is completely dead: tell everyone.  Too late to complain. */
if (raw_notifier_call_chain(cpu_chain, CPU_DEAD, hcpu) == NOTIFY_BAD)
BUG();
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-rc7-mm2] BUG while suspend to ram

2007-04-30 Thread Maciej Rutecki
Maciej Rutecki pisze:
 BUG: at kernel/kthread.c:166 kthread_bind()
  [c01465ac] _cpu_down+0x16c/0x250
  [c0146890] disable_nonboot_cpus+0x60/0xf0
  [c014cd67] pm_suspend_disk+0x177/0x2c0
  [c014b645] enter_state+0xb5/0x200
  [c014b84d] state_store+0xbd/0xd0
  [c014b790] state_store+0x0/0xd0
  [c01be189] subsys_attr_store+0x29/0x40
  [c01be3a4] sysfs_write_file+0xd4/0x160
  [c017b701] vfs_write+0xc1/0x160
  [c01be2d0] sysfs_write_file+0x0/0x160
  [c017be11] sys_write+0x41/0x70
  [c0187355] sys_dup2+0xd5/0x100
  [c01040f6] sysenter_past_esp+0x5f/0x85
  [c033] xfrm_policy_insert+0x210/0x400
  ===
 
 dmesg:
 http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/dmesg.txt.gz
 lsmod:
 http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lsmod.txt.gz
 ver_linux:
 http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/ver_linux.txt.gz
 lspci:
 http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lspci.txt.gz
 config:
 http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/config-2.6.21-rc7-mm2.gz
 

I use this script:

http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/suspend_to_disk.sh

-- 
Maciej Rutecki [EMAIL PROTECTED]
http://www.maciek.unixy.pl


smime.p7s
Description: S/MIME Cryptographic Signature


Re: 2.6.21-rc7-mm2 irqpoll seems to be broken

2007-04-30 Thread Vivek Goyal
On Thu, Apr 26, 2007 at 08:24:05AM -0700, Andrew Morton wrote:
 On Thu, 26 Apr 2007 15:06:20 +0530 Vivek Goyal [EMAIL PROTECTED] wrote:
 
  Hi,
  
  I am booting 2.6.21-rc7-mm2 on x86_64 box with irqpoll command line option
  and it panics. I can reproduce this problem easily on this box. Please
  let me know if serial console output is required.
  
  2.6.21-rc7 works just fine. So problem seems to be in some -mm patch.
  
  Unable to handle kernel NULL pointer dereference at 0009 RIP:
   [8025bc5e] note_interrupt+0x5d/0x21b
  PGD 1032c5067 PUD 1032c4067 PMD 0
  Oops:  [1] SMP
  CPU 1
  Modules linked in:
  Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1
  RIP: 0010:[8025bc5e]  [8025bc5e] 
  note_interrupt+0x5d/0x21b
  RSP: 0018:810100cbff08  EFLAGS: 00010002
  RAX:  RBX: 807e2d40 RCX: 
  RDX:  RSI: 807e2d40 RDI: 0004
  RBP: 807e2d40 R08:  R09: 
  R10: 0010 R11: 00a0 R12: 810104192f40
  R13: 807e2d84 R14:  R15: 
  FS:  () GS:810100854140() knlGS:
  CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
  CR2: 0009 CR3: 0001032c6000 CR4: 06e0
  Process swapper (pid: 0, threadinfo 810100cb8000, task 810100cb7500)
  Stack:  0004 0004  807e2d40
   0004 810104192f40 807e2d84 
    8025c7c5 810100cb9e98 810100cb9e98
  Call Trace:
   [8025c7c5] handle_edge_irq+0xf9/0x127
   [8020c2f9] do_IRQ+0xf1/0x160
   [8020a141] ret_from_intr+0x0/0xa
   [80208fd9] mwait_idle+0x42/0x45
   [80208f2f] cpu_idle+0xbd/0xe0
  
  
  Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b
  RIP  [8025bc5e] note_interrupt+0x5d/0x21b
   RSP 810100cbff08
  CR2: 0009
  Kernel panic - not syncing: Aiee, killing interrupt handler!
  
 
 hm.  I'd be suspecting
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch
 and
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch
 
 But because x86_64 doesn't implement IRQ_PER_CPU it's hard to see how we
 got into note_interrupt as a result of that patch.
 
 Adding the `noirqdebug' boot option would be interesting, perhaps.

noirqdebug gets rid of the problem. But that also effectively nullifies
irqpoll parameter.

Interestingly on another x86_64 machine this problem does not occur. So
something is dependent on hardware.

I put some debug statements on note_interrupt() and found that desc-action
is a NULL pointer and that's why the problem. Above patch acesses
desc-action-flags, hence it ends up accessing a NULL pointer.

handle_edge_irq() already makes sure that desc-action is not null, still
note_interrupt() is receiving desc-action as null, that's strange. On my 
system this is happening for irq 4 and /proc/interrupt shows that it is
coming from serial.

Thanks
Vivek

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Gautham R Shenoy
On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote:
 On Sun, 29 Apr 2007 22:27:44 +0200 Rafael J. Wysocki [EMAIL PROTECTED] 
 wrote:
 
  On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
   Hi all.
   
   There is a problem on my macbook core duo with suspend.
   after suspending when i'm trying to 'wake up' my notebook, it seems
   that it works, but i don't see anything at my monitor. So i have to
   reboot it to continue my work.
  
  What exactly do you do to suspend?
  
 
 This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING.

The state should be TASK_INTERRUPTIBLE. That's the state of the thread
'p' should be in when we do a kthread_bind(p) in _cpu_down().

Are you sure about the TASK_RUNNING part ?

 
 So I was sent the below, including worrisome changelog.
 

Ok, it should not be that worrisome!
By the time we would be doing kthread_stop(p) in _cpu_down(), 'p' would have
been moved over to some other online cpu, due to the migrate_dead_tasks() 
called in CPU_DEAD handling of migration_call (kernel/sched.c).

So we are safe. Anyway, I apologise for causing any worry :-)

Thanks and Regards
gautham.
 
 
 
 From: Gautham R Shenoy [EMAIL PROTECTED]
 
 We are anyway kthread_stop()ping other per-cpu kernel threads after
 move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread
 as well.
 
 I just checked with Vatsa if there was any subtle reason why they
 had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect
 any and I can't see any. So let us just remove the kthread_bind.
 
 Signed-off-by: Gautham R Shenoy [EMAIL PROTECTED]
 Cc: Oleg Nesterov [EMAIL PROTECTED]
 Cc: Eric W. Biederman [EMAIL PROTECTED]
 Cc: Rafael J. Wysocki [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]
 ---
 
  kernel/cpu.c |4 
  1 files changed, 4 deletions(-)
 
 diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c
 --- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down
 +++ a/kernel/cpu.c
 @@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu)
   /* This actually kills the CPU. */
   __cpu_die(cpu);
 
 - /* Move it here so it can run. */
 - kthread_bind(p, get_cpu());
 - put_cpu();
 -
   /* CPU is completely dead: tell everyone.  Too late to complain. */
   if (raw_notifier_call_chain(cpu_chain, CPU_DEAD, hcpu) == NOTIFY_BAD)
   BUG();
 _
 
 ___
 linux-pm mailing list
 [EMAIL PROTECTED]
 https://lists.linux-foundation.org/mailman/listinfo/linux-pm

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Rafael J. Wysocki
On Monday, 30 April 2007 12:05, Gautham R Shenoy wrote:
 On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote:
  On Sun, 29 Apr 2007 22:27:44 +0200 Rafael J. Wysocki [EMAIL PROTECTED] 
  wrote:
  
   On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
Hi all.

There is a problem on my macbook core duo with suspend.
after suspending when i'm trying to 'wake up' my notebook, it seems
that it works, but i don't see anything at my monitor. So i have to
reboot it to continue my work.
   
   What exactly do you do to suspend?
   
  
  This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING.
 
 The state should be TASK_INTERRUPTIBLE. That's the state of the thread
 'p' should be in when we do a kthread_bind(p) in _cpu_down().
 
 Are you sure about the TASK_RUNNING part ?

Well, the WARN_ON() in kernel/kthread.c, line166, is triggering here, so it
may be TASK_INTERRUPTIBLE too (should the WARN_ON() trigger in that case)?

Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 hangs in boot

2007-04-30 Thread Randy Dunlap
On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote:

 On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote:
 
  On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote:
  
   
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
  
  I'm getting a hang near the end of booting on x86_64 UP.
  The last initcall_debug function varies.  E.g.:
  
  1/
  [0.140257] Calling initcall 0x806f2fa8: 
  init_misc_binfmt+0x0/0x3f()
  [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
  returned 0.
  [0.140275] initcall 0x806f2fa8 ran for 0 msecs: 
  init_misc_binfmt+0x0/0x3f()
  [0.140284] Calling initcall 0x806f2fe7: 
  init_script_binfmt+0x0/0x12()
  [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
  returned 0.
  [0.140302] initcall 0x806f2fe7 ran for 0 msecs: 
  init_script_binfmt+0x0/0x12()
  [0.140310] Calling initcall 0x806f2ff9: 
  init_elf_binfmt+0x0/0x12()
  [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() 
  returned 0.
  [0.140326] initcall 0x806f2ff9 ran for 0 msecs: 
  init_elf_binfmt+0x0/0x12()
  [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a()
  [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() 
  returned 0.
  [0.140351] initcall 0x806f3de9 ran for 0 msecs: 
  debugfs_init+0x0/0x4a()
  
  2/
  [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29()
  [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 
  0.
  [0.140222] initcall 0x806efeb1 ran for 0 msecs: 
  ksysfs_init+0x0/0x29()
  [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31()
  [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() 
  returned 0.
  [0.140249] initcall 0x806f25be ran for 0 msecs: 
  filelock_init+0x0/0x31()
  [0.140258] Calling initcall 0x806f2fa8: 
  init_misc_binfmt+0x0/0x3f()
  [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() 
  returned 0.
  [0.140276] initcall 0x806f2fa8 ran for 0 msecs: 
  init_misc_binfmt+0x0/0x3f()
  [0.140284] Calling initcall 0x806f2fe7: 
  init_script_binfmt+0x0/0x12()
  [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() 
  returned 0.
  
 
 So perhaps it locks during a timer interrupt.
 
  .config is attached.
  
  Any ideas/suggestions?
 
 Just the usual: nothing from sysrq or NMI watchdog?

Nothing from either of those.  I'll jiggle some config options.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Tilman Schmidt
 With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly
 crashes during the SuSE 10.1 startup sequence. When booting to RL5,
 it panicblinks shortly after the graphical login screen appears.
 Booting to RL3, it hangs after the startup message:

I have now bisected this down to the section in the series file between
#GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH
to the CC list. I'll try bisecting further inside that section (unless
you tell me not to), but it may take some time.

The exact point during the startup sequence when the crash occurred and
the amount of BUG messages produced varied somewhat during these tests.
The common denominator, and my criterion for the good/bad decisions
during the bisect, was the crash (panic blink) just before completion
of the system startup.
Sometimes there weren't any BUG messages in the log (or perhaps they
just didn't make it to the disk.) Sometimes I just had a couple of the
sleeping function called from invalid context at mm/slab.c:3054
ones but no Eeek! page_mapcount(page) went negative! one before them.
However, whenever the Eeek! did appear it announced getcfg-interfac
as the current process and was followed by a few of the mm/slab.c:3054
ones.

HTH
Tilman

-- 
In the long run, we'll all be dead.



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 19:17:02 +0200
Tilman Schmidt [EMAIL PROTECTED] wrote:

  With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly
  crashes during the SuSE 10.1 startup sequence. When booting to RL5,
  it panicblinks shortly after the graphical login screen appears.
  Booting to RL3, it hangs after the startup message:
 
 I have now bisected this down to the section in the series file between
 #GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH
 to the CC list.

This is rather good news.  I was staring at about 200-300 MM patches
wondering which one was buggy.  Thanks heaps for doing the bisect.  Now the
main worry is Randy's dead box.

A lot of Greg's driver tree has gone upstream, so please check current
mainline.  If that's OK then we need to pick through the difference between
2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
that's a pretty small set.

 I'll try bisecting further inside that section (unless
 you tell me not to), but it may take some time.
 
 The exact point during the startup sequence when the crash occurred and
 the amount of BUG messages produced varied somewhat during these tests.
 The common denominator, and my criterion for the good/bad decisions
 during the bisect, was the crash (panic blink) just before completion
 of the system startup.
 Sometimes there weren't any BUG messages in the log (or perhaps they
 just didn't make it to the disk.) Sometimes I just had a couple of the
 sleeping function called from invalid context at mm/slab.c:3054
 ones but no Eeek! page_mapcount(page) went negative! one before them.
 However, whenever the Eeek! did appear it announced getcfg-interfac
 as the current process and was followed by a few of the mm/slab.c:3054
 ones.

hm, big mess.  Could be it was some glitch from Tejun's sysfs changes which
are all being extensively redone, so perhaps we'll never hear from it
again.  Or perhaps we just merged it into mainline.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Tilman Schmidt
Am 30.04.2007 20:21 schrieb Andrew Morton:
 A lot of Greg's driver tree has gone upstream, so please check current
 mainline.

2.6.21-final is fine.

  If that's OK then we need to pick through the difference between
 2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
 that's a pretty small set.

I'm not quite sure how to determine that difference. Can you just provide
me with a list of patches you'd like me to test?

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Wehrhausweg 66  Fax: +49 228 4299019
53227 Bonn
Germany



signature.asc
Description: OpenPGP digital signature


Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Andrew Morton
On Mon, 30 Apr 2007 21:28:06 +0200
Tilman Schmidt [EMAIL PROTECTED] wrote:

 Am 30.04.2007 20:21 schrieb Andrew Morton:
  A lot of Greg's driver tree has gone upstream, so please check current
  mainline.
 
 2.6.21-final is fine.

Sure, but what about 2.6.21-git3 (or, better, current -git)?

   If that's OK then we need to pick through the difference between
  2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
  that's a pretty small set.
 
 I'm not quite sure how to determine that difference. Can you just provide
 me with a list of patches you'd like me to test?

Not really - everything's tangled up.  A bisection search on the
2.6.21-rc7-mm2 driver tree would be the best bet.

See, 2.6.21-rc7-mm2 had:

gregkh-driver-driver-core-fix-device_add-error-path.patch
gregkh-driver-driver-core-fix-namespace-issue-with-devices-assigned-to-classes.patch
gregkh-driver-dev_printk-and-new-style-class-devices.patch
gregkh-driver-driver-core-udev-triggered-device-driver-binding.patch
gregkh-driver-driver-core-use-attribute-groups-in-struct-device_type.patch
gregkh-driver-named-device_type.patch
gregkh-driver-kobject-kobject_shadow_add-cleanup.patch
gregkh-driver-driver-core-per-subsystem-multithreaded-probing.patch
gregkh-driver-powerpc-make-it-compile-for-multithread-change.patch
gregkh-driver-driver-core-don-t-fail-attaching-the-device-if-it-cannot-be-bound.patch
gregkh-driver-driver-no-more-wait.patch
gregkh-driver-kref-fix-cpu-ordering-with-respect-to-krefs.patch
gregkh-driver-driver-core-notify-userspace-of-network-device-renames.patch
gregkh-driver-driver-core-suppress-uevents-via-filter.patch
gregkh-driver-driver-core-switch-firmware_class-to-uevent_suppress.patch
gregkh-driver-uevent-use-add_uevent_var-instead-of-open-coding-it.patch
gregkh-driver-driver-core-add-suspend-and-resume-to-struct-device_type.patch
gregkh-driver-kobject-kobject_ueventc-collapse-unnecessary-loop-nesting.patch
gregkh-driver-kobject-kobject_add-reference-leak.patch
gregkh-driver-devices_subsys-rwsem-removal.patch
gregkh-driver-scsi-hosts-rwsem-removal.patch
gregkh-driver-usb-bus-mutex.patch
gregkh-driver-pnp-remove-rwsem-usage.patch
gregkh-driver-input-serio-do-not-touch-bus-s-rwsem.patch
gregkh-driver-input-gameport-do-not-touch-bus-s-rwsem.patch
gregkh-driver-ide-proc-remove-rwsem.patch
gregkh-driver-ieee1394-rwsem-removal.patch
gregkh-driver-phy-rwsem-removal.patch
gregkh-driver-qeth-remove-usage-of-subsys_rwsem.patch
gregkh-driver-subsys-rwsem-removal.patch
gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch
gregkh-driver-sysfs-fix-error-handling-in-binattr-write.patch
gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc.patch
gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir.patch
gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions.patch
gregkh-driver-sysfs-add-sysfs_dirent-s_parent.patch
gregkh-driver-sysfs-add-sysfs_dirent-s_name.patch
gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union.patch
gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock.patch
gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree.patch
gregkh-driver-sysfs-implement-bin_buffer.patch
gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect.patch
gregkh-driver-sysfs-kill-attribute-file-orphaning.patch
gregkh-driver-sysfs-kill-unnecessary-attribute-owner.patch
gregkh-driver-sysfs-make-lockdep-ignore-s_active.patch
gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd.patch
gregkh-driver-sysfs-rename-object_depth-to-sysfs_path_depth-and-make-it-global.patch
gregkh-driver-sysfs-reimplement-sysfs_drop_dentry.patch
gregkh-driver-sysfs-kill-sysfs_dirent-s_dentry.patch
gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
gregkh-driver-driver-core-warn-for-odd-store-uevent-usage.patch
gregkh-driver-kobject-comment-and-warning-fixes-to-kobjectc.patch
gregkh-driver-the-overdue-removal-of-the-mount-umount-uevents.patch
gregkh-driver-debugfs-add-debugfs_create_u64.patch
gregkh-driver-bus_add_driver-return-error-for-no-bus.patch
gregkh-driver-uio.patch
gregkh-driver-uio-documentation.patch
gregkh-driver-uio-dummy.patch
gregkh-driver-uio-hilscher-cif-card-driver.patch
gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch
gregkh-driver-put_device-might_sleep.patch
gregkh-driver-kobject-warn.patch
gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch
gregkh-driver-nozomi.patch


and Greg's driver tree (as of yesterday, I think) had

gregkh-driver-uio.patch
gregkh-driver-uio-documentation.patch
gregkh-driver-uio-dummy.patch
gregkh-driver-uio-hilscher-cif-card-driver.patch
gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch
gregkh-driver-put_device-might_sleep.patch
gregkh-driver-kobject-warn.patch
gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch
gregkh-driver-nozomi.patch

So what has happened (approximately) is that

- the above nine patches have been held back, or are new

Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Dan Kruchinin

On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:

[Please don't drop addresses from the CC list]

On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote:
 On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  Hi,
 
  On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
   Hi all.
  
   There is a problem on my macbook core duo with suspend.
   after suspending when i'm trying to 'wake up' my notebook, it seems
   that it works, but i don't see anything at my monitor. So i have to
   reboot it to continue my work.
 
  What exactly do you do to suspend?
 
  Rafael
 
 
   ---
   Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
   kernel/kthread.c:166 kthread_bind()
   Apr 29 23:31:16 midgard kernel: [140594.900870]  [c0142c9b]
   _cpu_down+0x16b/0x250
   Apr 29 23:31:16 midgard kernel: [140594.900893]  [c0142f80]
   disable_nonboot_cpus+0x60/0xf0
   Apr 29 23:31:16 midgard kernel: [140594.900903]  [c0147efa]
   enter_state+0x22a/0x240
   Apr 29 23:31:16 midgard kernel: [140594.900913]  [c0147fcd]
   state_store+0xbd/0xd0
   Apr 29 23:31:16 midgard kernel: [140594.900920]  [c0147f10]
   state_store+0x0/0xd0
   Apr 29 23:31:16 midgard kernel: [140594.900927]  [c01c1559]
   subsys_attr_store+0x29/0x40
   Apr 29 23:31:16 midgard kernel: [140594.900937]  [c01c1774]
   sysfs_write_file+0xd4/0x160
   Apr 29 23:31:16 midgard kernel: [140594.900948]  [c0180eb6]
   vfs_write+0xa6/0x160
   Apr 29 23:31:16 midgard kernel: [140594.900958]  [c01c16a0]
   sysfs_write_file+0x0/0x160
   Apr 29 23:31:16 midgard kernel: [140594.900966]  [c0181601]
   sys_write+0x41/0x70
   Apr 29 23:31:16 midgard kernel: [140594.900974]  [c018c70b]
   sys_dup2+0xeb/0x120
   Apr 29 23:31:16 midgard kernel: [140594.900984]  [c0104116]
   sysenter_past_esp+0x5f/0x85
   Apr 29 23:31:16 midgard kernel: [140594.900999]  ===
   ---
  
   dmesg output:
   
   
   Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device vtcon0
   Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device platform
   Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot CPUs 
...
   Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline
   Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives:
   switching to UP code
   Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
   kernel/kthread.c:166 kthread_bind()
   Apr 29 23:31:16 midgard kernel: [140594.900870]  [c0142c9b]
   _cpu_down+0x16b/0x250
   Apr 29 23:31:16 midgard kernel: [140594.900893]  [c0142f80]
   disable_nonboot_cpus+0x60/0xf0
   Apr 29 23:31:16 midgard kernel: [140594.900903]  [c0147efa]
   enter_state+0x22a/0x240
   Apr 29 23:31:16 midgard kernel: [140594.900913]  [c0147fcd]
   state_store+0xbd/0xd0
   Apr 29 23:31:16 midgard kernel: [140594.900920]  [c0147f10]
   state_store+0x0/0xd0
   Apr 29 23:31:16 midgard kernel: [140594.900927]  [c01c1559]
   subsys_attr_store+0x29/0x40
   Apr 29 23:31:16 midgard kernel: [140594.900937]  [c01c1774]
   sysfs_write_file+0xd4/0x160
   Apr 29 23:31:16 midgard kernel: [140594.900948]  [c0180eb6]
   vfs_write+0xa6/0x160
   Apr 29 23:31:16 midgard kernel: [140594.900958]  [c01c16a0]
   sysfs_write_file+0x0/0x160
   Apr 29 23:31:16 midgard kernel: [140594.900966]  [c0181601]
   sys_write+0x41/0x70
   Apr 29 23:31:16 midgard kernel: [140594.900974]  [c018c70b]
   sys_dup2+0xeb/0x120
   Apr 29 23:31:16 midgard kernel: [140594.900984]  [c0104116]
   sysenter_past_esp+0x5f/0x85
   Apr 29 23:31:16 midgard kernel: [140594.900999]  ===
   Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down
   Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot CPUs 
...
   Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives:
   switching to SMP code
   Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 1/1 
eip 3000
   Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1
   Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay
   using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390)
   Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait feature 
present.
   Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache:
   32K, L1 D cache: 32K
   Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K
   Apr 29 23:31:16 midgard kernel: [18014366.509160] CPU: Physical Processor 
ID: 0
   Apr 29 23:31:16 midgard kernel: [18014366.509161] CPU: Processor Core ID: 
1
   Apr 29 23:31:16 midgard kernel: [18014366.509637] CPU1: Intel Genuine
   Intel(R) CPU1500  @ 2.00GHz stepping 08
   Apr 29 23:31:16 midgard kernel: [18014366.509659] checking TSC
   synchronization [CPU#0 - CPU#1]:
   Apr 29 23:31:16 midgard kernel: [18014366.529627] Measured 68812018716
   cycles TSC warp between CPUs, turning off TSC clock.
   Apr 29 23:31:16 midgard kernel: [18014366.529630] Marking TSC unstable
   due to: check_tsc_sync_source failed.
   Apr 29 23:31:16 midgard 

Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]

2007-04-30 Thread Rafael J. Wysocki
On Monday, 30 April 2007 22:52, Dan Kruchinin wrote:
 On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  [Please don't drop addresses from the CC list]
 
  On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote:
   On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
Hi,
   
On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote:
 Hi all.

 There is a problem on my macbook core duo with suspend.
 after suspending when i'm trying to 'wake up' my notebook, it seems
 that it works, but i don't see anything at my monitor. So i have to
 reboot it to continue my work.
   
What exactly do you do to suspend?
   
Rafael
   
   
 ---
 Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
 kernel/kthread.c:166 kthread_bind()
 Apr 29 23:31:16 midgard kernel: [140594.900870]  [c0142c9b]
 _cpu_down+0x16b/0x250
 Apr 29 23:31:16 midgard kernel: [140594.900893]  [c0142f80]
 disable_nonboot_cpus+0x60/0xf0
 Apr 29 23:31:16 midgard kernel: [140594.900903]  [c0147efa]
 enter_state+0x22a/0x240
 Apr 29 23:31:16 midgard kernel: [140594.900913]  [c0147fcd]
 state_store+0xbd/0xd0
 Apr 29 23:31:16 midgard kernel: [140594.900920]  [c0147f10]
 state_store+0x0/0xd0
 Apr 29 23:31:16 midgard kernel: [140594.900927]  [c01c1559]
 subsys_attr_store+0x29/0x40
 Apr 29 23:31:16 midgard kernel: [140594.900937]  [c01c1774]
 sysfs_write_file+0xd4/0x160
 Apr 29 23:31:16 midgard kernel: [140594.900948]  [c0180eb6]
 vfs_write+0xa6/0x160
 Apr 29 23:31:16 midgard kernel: [140594.900958]  [c01c16a0]
 sysfs_write_file+0x0/0x160
 Apr 29 23:31:16 midgard kernel: [140594.900966]  [c0181601]
 sys_write+0x41/0x70
 Apr 29 23:31:16 midgard kernel: [140594.900974]  [c018c70b]
 sys_dup2+0xeb/0x120
 Apr 29 23:31:16 midgard kernel: [140594.900984]  [c0104116]
 sysenter_past_esp+0x5f/0x85
 Apr 29 23:31:16 midgard kernel: [140594.900999]  
 ===
 ---

 dmesg output:
 
 
 Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device 
 vtcon0
 Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device 
 platform
 Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot 
 CPUs ...
 Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline
 Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives:
 switching to UP code
 Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at
 kernel/kthread.c:166 kthread_bind()
 Apr 29 23:31:16 midgard kernel: [140594.900870]  [c0142c9b]
 _cpu_down+0x16b/0x250
 Apr 29 23:31:16 midgard kernel: [140594.900893]  [c0142f80]
 disable_nonboot_cpus+0x60/0xf0
 Apr 29 23:31:16 midgard kernel: [140594.900903]  [c0147efa]
 enter_state+0x22a/0x240
 Apr 29 23:31:16 midgard kernel: [140594.900913]  [c0147fcd]
 state_store+0xbd/0xd0
 Apr 29 23:31:16 midgard kernel: [140594.900920]  [c0147f10]
 state_store+0x0/0xd0
 Apr 29 23:31:16 midgard kernel: [140594.900927]  [c01c1559]
 subsys_attr_store+0x29/0x40
 Apr 29 23:31:16 midgard kernel: [140594.900937]  [c01c1774]
 sysfs_write_file+0xd4/0x160
 Apr 29 23:31:16 midgard kernel: [140594.900948]  [c0180eb6]
 vfs_write+0xa6/0x160
 Apr 29 23:31:16 midgard kernel: [140594.900958]  [c01c16a0]
 sysfs_write_file+0x0/0x160
 Apr 29 23:31:16 midgard kernel: [140594.900966]  [c0181601]
 sys_write+0x41/0x70
 Apr 29 23:31:16 midgard kernel: [140594.900974]  [c018c70b]
 sys_dup2+0xeb/0x120
 Apr 29 23:31:16 midgard kernel: [140594.900984]  [c0104116]
 sysenter_past_esp+0x5f/0x85
 Apr 29 23:31:16 midgard kernel: [140594.900999]  
 ===
 Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down
 Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot 
 CPUs ...
 Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives:
 switching to SMP code
 Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 
 1/1 eip 3000
 Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1
 Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay
 using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390)
 Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait 
 feature present.
 Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache:
 32K, L1 D cache: 32K
 Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K
 Apr 29 23:31:16 midgard kernel: [18014366.509160] CPU: Physical 
 Processor ID: 0
 Apr 29 23:31:16 midgard kernel: [18014366.509161] CPU: Processor Core 
 ID: 1
 Apr 29 23:31:16 midgard kernel: [18014366.509637] CPU1: Intel Genuine
 Intel(R) CPU1500  @ 2.00GHz stepping 08
 Apr 29 23:31:16 midgard kernel: [18014366.509659] checking TSC

Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)

2007-04-30 Thread Tilman Schmidt
Am 30.04.2007 21:46 schrieb Andrew Morton:

 2.6.21-final is fine.
 
 Sure, but what about 2.6.21-git3 (or, better, current -git)?

OIC. Sorry for being dense. Will check.

  If that's OK then we need to pick through the difference between
 2.6.21-rc7-mm2's driver tree and the patches which went into mainline.  And
 that's a pretty small set.
 I'm not quite sure how to determine that difference. Can you just provide
 me with a list of patches you'd like me to test?
 
 Not really - everything's tangled up.  A bisection search on the
 2.6.21-rc7-mm2 driver tree would be the best bet.

Ok. No prob. It'll just take a bit of time. (Compiling a kernel on
that machine takes about 4 hours.)

I'll be back. :-)

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
- Undetected errors are handled as if no error occurred. (IBM) -



signature.asc
Description: OpenPGP digital signature


  1   2   3   >