Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2
On Wed, Nov 07, 2007 at 02:20:03PM -0500, Steven Rostedt wrote: > > > > Introduce a macro for suppressing gcc from generating a warning about a > > probable uninitialized state of a variable. > > > > Example: > > > > - spinlock_t *ptl; > > + spinlock_t *uninitialized_var(ptl); > > > > Not a happy solution, but those warnings are obnoxious. > > > > - Using the usual pointlessly-set-it-to-zero approach wastes several > > bytes of text. > > > > - Using a macro means we can (hopefully) do something else if gcc changes > > cause the `x = x' hack to stop working > > > > - Using a macro means that people who are worried about hiding true bugs > > can easily turn it off. > > > > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]> > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > > I just stumbled across this being in the kernel. Well, I'm finally glad > it made it in, even though it was suggested one year earlier ;-) > > http://lkml.org/lkml/2006/5/11/50 yeah, this was Andrew's idea. The version in the kernel, in contrast to yours, doesn't have a config option so you still have to make really sure you're not aiding any bugs with it. -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2
> > Introduce a macro for suppressing gcc from generating a warning about a > probable uninitialized state of a variable. > > Example: > > - spinlock_t *ptl; > + spinlock_t *uninitialized_var(ptl); > > Not a happy solution, but those warnings are obnoxious. > > - Using the usual pointlessly-set-it-to-zero approach wastes several > bytes of text. > > - Using a macro means we can (hopefully) do something else if gcc changes > cause the `x = x' hack to stop working > > - Using a macro means that people who are worried about hiding true bugs > can easily turn it off. > > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> I just stumbled across this being in the kernel. Well, I'm finally glad it made it in, even though it was suggested one year earlier ;-) http://lkml.org/lkml/2006/5/11/50 -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2
Introduce a macro for suppressing gcc from generating a warning about a probable uninitialized state of a variable. Example: - spinlock_t *ptl; + spinlock_t *uninitialized_var(ptl); Not a happy solution, but those warnings are obnoxious. - Using the usual pointlessly-set-it-to-zero approach wastes several bytes of text. - Using a macro means we can (hopefully) do something else if gcc changes cause the `x = x' hack to stop working - Using a macro means that people who are worried about hiding true bugs can easily turn it off. Signed-off-by: Borislav Petkov [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] I just stumbled across this being in the kernel. Well, I'm finally glad it made it in, even though it was suggested one year earlier ;-) http://lkml.org/lkml/2006/5/11/50 -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2
On Wed, Nov 07, 2007 at 02:20:03PM -0500, Steven Rostedt wrote: Introduce a macro for suppressing gcc from generating a warning about a probable uninitialized state of a variable. Example: - spinlock_t *ptl; + spinlock_t *uninitialized_var(ptl); Not a happy solution, but those warnings are obnoxious. - Using the usual pointlessly-set-it-to-zero approach wastes several bytes of text. - Using a macro means we can (hopefully) do something else if gcc changes cause the `x = x' hack to stop working - Using a macro means that people who are worried about hiding true bugs can easily turn it off. Signed-off-by: Borislav Petkov [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] I just stumbled across this being in the kernel. Well, I'm finally glad it made it in, even though it was suggested one year earlier ;-) http://lkml.org/lkml/2006/5/11/50 yeah, this was Andrew's idea. The version in the kernel, in contrast to yours, doesn't have a config option so you still have to make really sure you're not aiding any bugs with it. -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
* Vivek Goyal <[EMAIL PROTECTED]> [2007-05-17 15:05]: > On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote: > > * Vivek Goyal <[EMAIL PROTECTED]> [2007-05-08 19:18]: > > > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: > > > > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]: > > > > > > > > > > handle_edge_irq() already makes sure that desc->action is not null, > > > > > still > > > > > note_interrupt() is receiving desc->action as null, that's strange. > > > > > On my > > > > > system this is happening for irq 4 and /proc/interrupt shows that it > > > > > is > > > > > coming from "serial". > > > > > > > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time > > > > to take a look at this at your site? For the meanwhile, should I > > > > create a patch that checks for desc->action in note_interrupt(), too? > > > > > > I can reproduce this problem only on one machine. I think there is some > > > race condition and your code somehow just exposes it. > > > > thanks for finding that out. Could you try/review out the patch below? > > As the lock is only aquired when irqfixup == 2 it shouldn't impact > > performance of a 'normal' system. > > It does fix up my problem. I have modified your patch a bit. I think > new version is little more clear. What do you think? Aggreed. Thanks for spotting that problem out! Bernhard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote: > * Vivek Goyal <[EMAIL PROTECTED]> [2007-05-08 19:18]: > > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: > > > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]: > > > > > > > > handle_edge_irq() already makes sure that desc->action is not null, > > > > still > > > > note_interrupt() is receiving desc->action as null, that's strange. On > > > > my > > > > system this is happening for irq 4 and /proc/interrupt shows that it is > > > > coming from "serial". > > > > > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time > > > to take a look at this at your site? For the meanwhile, should I > > > create a patch that checks for desc->action in note_interrupt(), too? > > > > I can reproduce this problem only on one machine. I think there is some > > race condition and your code somehow just exposes it. > > thanks for finding that out. Could you try/review out the patch below? > As the lock is only aquired when irqfixup == 2 it shouldn't impact > performance of a 'normal' system. > Hi Bernhard, It does fix up my problem. I have modified your patch a bit. I think new version is little more clear. What do you think? Thanks Vivek o System crashes if booted with irqpoll command line option. o Problem happens because Inside note_interrupt() we are accessing desc->action->flag without taking the desc->lock. While accessing it somebody goes ahead and unregisters the irq handler hence desc->action is NULL. By the time note_interrupt() checks it, it crashes. o In my system it is irq 4 seriving to serial driver. o Take the desc->lock before accessing desc->action->flag. Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]> Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]> --- linux-2.6.21-git12-root/kernel/irq/spurious.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff -puN kernel/irq/spurious.c~fix-irqpoll-crash kernel/irq/spurious.c --- linux-2.6.21-git12/kernel/irq/spurious.c~fix-irqpoll-crash 2007-05-17 17:36:50.0 +0530 +++ linux-2.6.21-git12-root/kernel/irq/spurious.c 2007-05-17 17:53:52.0 +0530 @@ -138,6 +138,8 @@ report_bad_irq(unsigned int irq, struct void note_interrupt(unsigned int irq, struct irq_desc *desc, irqreturn_t action_ret) { + int call_misrouted_irq = 0; + if (unlikely(action_ret != IRQ_HANDLED)) { desc->irqs_unhandled++; if (unlikely(action_ret != IRQ_NONE)) @@ -146,9 +148,24 @@ void note_interrupt(unsigned int irq, st if (unlikely(irqfixup)) { /* Don't punish working computers */ - if ((irqfixup == 2 && ((irq == 0) || - (desc->action->flags & IRQF_IRQPOLL))) || - action_ret == IRQ_NONE) { + if (action_ret == IRQ_NONE) + /* Nobody handled irq. Possibly a misrouted one. */ + call_misrouted_irq = 1; + else if (irqfixup == 2) { + /* irqpoll is enabled. Is this the irq driving +* polling. +*/ + if (irq == 0) + call_misrouted_irq = 1; + else { + spin_lock(>lock); + if (desc->action && + (desc->action->flags & IRQF_IRQPOLL)) + call_misrouted_irq = 1; + spin_unlock(>lock); + } + } + if (call_misrouted_irq) { int ok = misrouted_irq(irq); if (action_ret == IRQ_NONE) desc->irqs_unhandled -= ok; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 irqpoll seems to be broken
On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote: * Vivek Goyal [EMAIL PROTECTED] [2007-05-08 19:18]: On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]: handle_edge_irq() already makes sure that desc-action is not null, still note_interrupt() is receiving desc-action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from serial. Unfortunately, I couldn't reproduce this here. Vivek, do you have time to take a look at this at your site? For the meanwhile, should I create a patch that checks for desc-action in note_interrupt(), too? I can reproduce this problem only on one machine. I think there is some race condition and your code somehow just exposes it. thanks for finding that out. Could you try/review out the patch below? As the lock is only aquired when irqfixup == 2 it shouldn't impact performance of a 'normal' system. Hi Bernhard, It does fix up my problem. I have modified your patch a bit. I think new version is little more clear. What do you think? Thanks Vivek o System crashes if booted with irqpoll command line option. o Problem happens because Inside note_interrupt() we are accessing desc-action-flag without taking the desc-lock. While accessing it somebody goes ahead and unregisters the irq handler hence desc-action is NULL. By the time note_interrupt() checks it, it crashes. o In my system it is irq 4 seriving to serial driver. o Take the desc-lock before accessing desc-action-flag. Signed-off-by: Bernhard Walle [EMAIL PROTECTED] Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- linux-2.6.21-git12-root/kernel/irq/spurious.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff -puN kernel/irq/spurious.c~fix-irqpoll-crash kernel/irq/spurious.c --- linux-2.6.21-git12/kernel/irq/spurious.c~fix-irqpoll-crash 2007-05-17 17:36:50.0 +0530 +++ linux-2.6.21-git12-root/kernel/irq/spurious.c 2007-05-17 17:53:52.0 +0530 @@ -138,6 +138,8 @@ report_bad_irq(unsigned int irq, struct void note_interrupt(unsigned int irq, struct irq_desc *desc, irqreturn_t action_ret) { + int call_misrouted_irq = 0; + if (unlikely(action_ret != IRQ_HANDLED)) { desc-irqs_unhandled++; if (unlikely(action_ret != IRQ_NONE)) @@ -146,9 +148,24 @@ void note_interrupt(unsigned int irq, st if (unlikely(irqfixup)) { /* Don't punish working computers */ - if ((irqfixup == 2 ((irq == 0) || - (desc-action-flags IRQF_IRQPOLL))) || - action_ret == IRQ_NONE) { + if (action_ret == IRQ_NONE) + /* Nobody handled irq. Possibly a misrouted one. */ + call_misrouted_irq = 1; + else if (irqfixup == 2) { + /* irqpoll is enabled. Is this the irq driving +* polling. +*/ + if (irq == 0) + call_misrouted_irq = 1; + else { + spin_lock(desc-lock); + if (desc-action + (desc-action-flags IRQF_IRQPOLL)) + call_misrouted_irq = 1; + spin_unlock(desc-lock); + } + } + if (call_misrouted_irq) { int ok = misrouted_irq(irq); if (action_ret == IRQ_NONE) desc-irqs_unhandled -= ok; _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 irqpoll seems to be broken
* Vivek Goyal [EMAIL PROTECTED] [2007-05-17 15:05]: On Mon, May 14, 2007 at 04:05:15PM +0200, Bernhard Walle wrote: * Vivek Goyal [EMAIL PROTECTED] [2007-05-08 19:18]: On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]: handle_edge_irq() already makes sure that desc-action is not null, still note_interrupt() is receiving desc-action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from serial. Unfortunately, I couldn't reproduce this here. Vivek, do you have time to take a look at this at your site? For the meanwhile, should I create a patch that checks for desc-action in note_interrupt(), too? I can reproduce this problem only on one machine. I think there is some race condition and your code somehow just exposes it. thanks for finding that out. Could you try/review out the patch below? As the lock is only aquired when irqfixup == 2 it shouldn't impact performance of a 'normal' system. It does fix up my problem. I have modified your patch a bit. I think new version is little more clear. What do you think? Aggreed. Thanks for spotting that problem out! Bernhard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
* Vivek Goyal <[EMAIL PROTECTED]> [2007-05-08 19:18]: > On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: > > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]: > > > > > > handle_edge_irq() already makes sure that desc->action is not null, still > > > note_interrupt() is receiving desc->action as null, that's strange. On my > > > system this is happening for irq 4 and /proc/interrupt shows that it is > > > coming from "serial". > > > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time > > to take a look at this at your site? For the meanwhile, should I > > create a patch that checks for desc->action in note_interrupt(), too? > > I can reproduce this problem only on one machine. I think there is some > race condition and your code somehow just exposes it. thanks for finding that out. Could you try/review out the patch below? As the lock is only aquired when irqfixup == 2 it shouldn't impact performance of a 'normal' system. Thanks, Bernhard --- kernel/irq/spurious.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -145,10 +145,20 @@ void note_interrupt(unsigned int irq, st } if (unlikely(irqfixup)) { - /* Don't punish working computers */ - if ((irqfixup == 2 && ((irq == 0) || - (desc->action->flags & IRQF_IRQPOLL))) || - action_ret == IRQ_NONE) { + int call_misrouted_irq = action_ret == IRQ_NONE; + + if (!call_misrouted_irq && irqfixup == 2) { + if (irq == 0) + call_misrouted_irq = 1; + else { + spin_lock(>lock); + if (desc->action && (desc->action->flags & IRQF_IRQPOLL)) + call_misrouted_irq = 1; + spin_unlock(>lock); + } + } + + if (call_misrouted_irq) { int ok = misrouted_irq(irq); if (action_ret == IRQ_NONE) desc->irqs_unhandled -= ok; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 irqpoll seems to be broken
* Vivek Goyal [EMAIL PROTECTED] [2007-05-08 19:18]: On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]: handle_edge_irq() already makes sure that desc-action is not null, still note_interrupt() is receiving desc-action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from serial. Unfortunately, I couldn't reproduce this here. Vivek, do you have time to take a look at this at your site? For the meanwhile, should I create a patch that checks for desc-action in note_interrupt(), too? I can reproduce this problem only on one machine. I think there is some race condition and your code somehow just exposes it. thanks for finding that out. Could you try/review out the patch below? As the lock is only aquired when irqfixup == 2 it shouldn't impact performance of a 'normal' system. Thanks, Bernhard --- kernel/irq/spurious.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -145,10 +145,20 @@ void note_interrupt(unsigned int irq, st } if (unlikely(irqfixup)) { - /* Don't punish working computers */ - if ((irqfixup == 2 ((irq == 0) || - (desc-action-flags IRQF_IRQPOLL))) || - action_ret == IRQ_NONE) { + int call_misrouted_irq = action_ret == IRQ_NONE; + + if (!call_misrouted_irq irqfixup == 2) { + if (irq == 0) + call_misrouted_irq = 1; + else { + spin_lock(desc-lock); + if (desc-action (desc-action-flags IRQF_IRQPOLL)) + call_misrouted_irq = 1; + spin_unlock(desc-lock); + } + } + + if (call_misrouted_irq) { int ok = misrouted_irq(irq); if (action_ret == IRQ_NONE) desc-irqs_unhandled -= ok; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: > * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]: > > > > handle_edge_irq() already makes sure that desc->action is not null, still > > note_interrupt() is receiving desc->action as null, that's strange. On my > > system this is happening for irq 4 and /proc/interrupt shows that it is > > coming from "serial". > > Unfortunately, I couldn't reproduce this here. Vivek, do you have time > to take a look at this at your site? For the meanwhile, should I > create a patch that checks for desc->action in note_interrupt(), too? > Hi Bernhard, I can reproduce this problem only on one machine. I think there is some race condition and your code somehow just exposes it. I put few WARN_ON(!desc->action) in handle_edge_irq() and what I find that after handle_IRQ_event(), desc->action has become null. That means in the meantime somebody has gone ahead and modified the desc. This must have happened because we have release desc->lock while running handle_IRQ_event(). This means there is a race somewhere. It is verified by the fact that this problem does not occur if same system is booted with only one cpu (maxcpus=1). Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 irqpoll seems to be broken
On Thu, May 03, 2007 at 12:19:32AM +0200, Bernhard Walle wrote: * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]: handle_edge_irq() already makes sure that desc-action is not null, still note_interrupt() is receiving desc-action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from serial. Unfortunately, I couldn't reproduce this here. Vivek, do you have time to take a look at this at your site? For the meanwhile, should I create a patch that checks for desc-action in note_interrupt(), too? Hi Bernhard, I can reproduce this problem only on one machine. I think there is some race condition and your code somehow just exposes it. I put few WARN_ON(!desc-action) in handle_edge_irq() and what I find that after handle_IRQ_event(), desc-action has become null. That means in the meantime somebody has gone ahead and modified the desc. This must have happened because we have release desc-lock while running handle_IRQ_event(). This means there is a race somewhere. It is verified by the fact that this problem does not occur if same system is booted with only one cpu (maxcpus=1). Thanks Vivek - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 breaks 'lvm vgscan'.
On Thu, 26 Apr 2007 22:31:15 EDT, [EMAIL PROTECTED] said: > On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel) > > gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch > > The initrd on my system does an 'lvm vgscan' to get the root filesystem > accessible. This is confirmed fixed in 2.6.21-mm1. pgpYbr3u76BUo.pgp Description: PGP signature
Re: 2.6.21-rc7-mm2 breaks 'lvm vgscan'.
On Thu, 26 Apr 2007 22:31:15 EDT, [EMAIL PROTECTED] said: On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel) gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch The initrd on my system does an 'lvm vgscan' to get the root filesystem accessible. This is confirmed fixed in 2.6.21-mm1. pgpYbr3u76BUo.pgp Description: PGP signature
Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
On Fri, 4 May 2007, Andrew Morton wrote: > Better, we should be emitting loud warnigns which then disable themselves > and then succeeding the allocation so that people can proceed with their > kernel testing. > > When all the loud-warning sites have been fixed, we can take that code out > again. > > The present situation is maximally tester-hostile. i SLUB: Allocate smallest object size if the user asks for 0 bytes. Makes SLUB behave like SLAB in this area to avoid issues Throw a stack dump to alert people. At some point the behavior should be switched back. NULL is no memory as far as I can tell and if the use asked for 0 bytes then he need to get no memory. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h |8 ++-- mm/slub.c|2 +- 2 files changed, 7 insertions(+), 3 deletions(-) Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-05-04 14:17:22.0 -0700 +++ slub/mm/slub.c 2007-05-04 14:19:36.0 -0700 @@ -2009,7 +2009,7 @@ static struct kmem_cache *get_slab(size_ { int index = kmalloc_index(size); - if (!size) + if (!index) return NULL; /* Allocation too large? */ Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-05-04 14:13:40.0 -0700 +++ slub/include/linux/slub_def.h 2007-05-04 14:18:25.0 -0700 @@ -81,8 +81,12 @@ extern struct kmem_cache kmalloc_caches[ */ static inline int kmalloc_index(int size) { - if (size == 0) - return 0; + /* +* We should return 0 if size == 0 but we use the smallest object +* here for SLAB legacy reasons. +*/ + WARN_ON(size == 0); + if (size > 64 && size <= 96) return 1; if (size > 128 && size <= 192) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
On Fri, 04 May 2007 12:38:58 +0100 Andy Whitcroft <[EMAIL PROTECTED]> wrote: > > Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a > bit of a challenge. We have been seeing panics on boot from the > hvsi driver: > > Couldn't register hvsi console driver > > Tracking this back, this seems to come from hvsi driver trying to > register itself via tty_register_driver() with a zero units. > > The failure is triggered by a change in semantics for kmalloc() > between SLAB and SLUB; kmalloc(0) now returns NULL rather than an > allocation at the smallest size. Looking at the code in question > even when the allocation succeeds we will not actually use the > memory when device->num is zero. OK, thanks for working that out. Christoph, we should be emitting loud warnings so that this problem is easy to debug. Better, we should be emitting loud warnigns which then disable themselves and then succeeding the allocation so that people can proceed with their kernel testing. When all the loud-warning sites have been fixed, we can take that code out again. The present situation is maximally tester-hostile. > It is not clear to me if this is a bug in the hvsi driver in that > it should specify some units. It seems we will try and reserve zero > devices in this case, which seems pointless. > > I have tested with the patch below which seems safe to me and stops > the errors and even seems to make the console work. But perhaps > someone with more driver fu, could verify if driver->num of zero > has any meaning and kick this to the hvsi people if not. > > -apw > > === 8< === > tty_register_driver: only allocate tty instances when defined > > If device->num is zero we attempt to kmalloc() zero bytes. > When SLUB is enabled this returns a null pointer and take that as > an allocation failure and fail the device register. Check for no > devices and avoid the allocation. > > Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]> > --- > diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c > index 959a616..71c4579 100644 > --- a/drivers/char/tty_io.c > +++ b/drivers/char/tty_io.c > @@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver) > if (driver->flags & TTY_DRIVER_INSTALLED) > return 0; > > - if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM)) { > + if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM) && driver->num) { > p = kmalloc(driver->num * 3 * sizeof(void *), GFP_KERNEL); > if (!p) > return -ENOMEM; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
On Fri, May 04, 2007 at 12:38:58PM +0100, Andy Whitcroft wrote: > > Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a > bit of a challenge. We have been seeing panics on boot from the > hvsi driver: > > Couldn't register hvsi console driver > > Tracking this back, this seems to come from hvsi driver trying to > register itself via tty_register_driver() with a zero units. > > The failure is triggered by a change in semantics for kmalloc() > between SLAB and SLUB; kmalloc(0) now returns NULL rather than an > allocation at the smallest size. Looking at the code in question > even when the allocation succeeds we will not actually use the > memory when device->num is zero. > > It is not clear to me if this is a bug in the hvsi driver in that > it should specify some units. It seems we will try and reserve zero > devices in this case, which seems pointless. Yes, it seems pointless to me ... > I have tested with the patch below which seems safe to me and stops > the errors and even seems to make the console work. But perhaps > someone with more driver fu, could verify if driver->num of zero > has any meaning and kick this to the hvsi people if not. Hollis nominated me to be "hvsi people", although I'm near-totally ignorant of the thing. If hvsi_count is zero, then the device tree did not have any "serial" nodes that speak "hvterm-protocol". The hvsi should not have even tried to register anything. The attached patch seems more to the point. --linas The hvsi driver is used whenever the device-tree contains nodes for serial ports, and those serial ports speak the hvterm protocol. However, if no such nodes are found, then the hvsi driver should not even register. This patch avoids a kernel panic with "Couldn't register hvsi console driver". In addition, this patch makes tty_register_driver refuse to do anything, if there are no actual tty ports to be registered. Utterly & completely untested. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> ---- drivers/char/hvsi.c |4 drivers/char/tty_io.c |3 +++ 2 files changed, 7 insertions(+) Index: linux-2.6.21-rc7-mm2/drivers/char/hvsi.c =============== --- linux-2.6.21-rc7-mm2.orig/drivers/char/hvsi.c 2007-04-26 15:37:33.0 -0500 +++ linux-2.6.21-rc7-mm2/drivers/char/hvsi.c2007-05-04 13:55:56.0 -0500 @@ -1148,6 +1148,10 @@ static int __init hvsi_init(void) { int i; + /* No serial hvterm-protocol device-tree nodes found. */ + if (hvsi_count == 0) + return 0; + hvsi_driver = alloc_tty_driver(hvsi_count); if (!hvsi_driver) return -ENOMEM; Index: linux-2.6.21-rc7-mm2/drivers/char/tty_io.c =============== --- linux-2.6.21-rc7-mm2.orig/drivers/char/tty_io.c 2007-04-26 15:37:33.0 -0500 +++ linux-2.6.21-rc7-mm2/drivers/char/tty_io.c 2007-05-04 13:54:14.0 -0500 @@ -3724,6 +3724,9 @@ int tty_register_driver(struct tty_drive if (driver->flags & TTY_DRIVER_INSTALLED) return 0; + if (driver->num == 0) + return -ENODEV; + if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM)) { p = kmalloc(driver->num * 3 * sizeof(void *), GFP_KERNEL); if (!p) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a bit of a challenge. We have been seeing panics on boot from the hvsi driver: Couldn't register hvsi console driver Tracking this back, this seems to come from hvsi driver trying to register itself via tty_register_driver() with a zero units. The failure is triggered by a change in semantics for kmalloc() between SLAB and SLUB; kmalloc(0) now returns NULL rather than an allocation at the smallest size. Looking at the code in question even when the allocation succeeds we will not actually use the memory when device->num is zero. It is not clear to me if this is a bug in the hvsi driver in that it should specify some units. It seems we will try and reserve zero devices in this case, which seems pointless. I have tested with the patch below which seems safe to me and stops the errors and even seems to make the console work. But perhaps someone with more driver fu, could verify if driver->num of zero has any meaning and kick this to the hvsi people if not. -apw === 8< === tty_register_driver: only allocate tty instances when defined If device->num is zero we attempt to kmalloc() zero bytes. When SLUB is enabled this returns a null pointer and take that as an allocation failure and fail the device register. Check for no devices and avoid the allocation. Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]> --- diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c index 959a616..71c4579 100644 --- a/drivers/char/tty_io.c +++ b/drivers/char/tty_io.c @@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver) if (driver->flags & TTY_DRIVER_INSTALLED) return 0; - if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM)) { + if (!(driver->flags & TTY_DRIVER_DEVPTS_MEM) && driver->num) { p = kmalloc(driver->num * 3 * sizeof(void *), GFP_KERNEL); if (!p) return -ENOMEM; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a bit of a challenge. We have been seeing panics on boot from the hvsi driver: Couldn't register hvsi console driver Tracking this back, this seems to come from hvsi driver trying to register itself via tty_register_driver() with a zero units. The failure is triggered by a change in semantics for kmalloc() between SLAB and SLUB; kmalloc(0) now returns NULL rather than an allocation at the smallest size. Looking at the code in question even when the allocation succeeds we will not actually use the memory when device-num is zero. It is not clear to me if this is a bug in the hvsi driver in that it should specify some units. It seems we will try and reserve zero devices in this case, which seems pointless. I have tested with the patch below which seems safe to me and stops the errors and even seems to make the console work. But perhaps someone with more driver fu, could verify if driver-num of zero has any meaning and kick this to the hvsi people if not. -apw === 8 === tty_register_driver: only allocate tty instances when defined If device-num is zero we attempt to kmalloc() zero bytes. When SLUB is enabled this returns a null pointer and take that as an allocation failure and fail the device register. Check for no devices and avoid the allocation. Signed-off-by: Andy Whitcroft [EMAIL PROTECTED] --- diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c index 959a616..71c4579 100644 --- a/drivers/char/tty_io.c +++ b/drivers/char/tty_io.c @@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver) if (driver-flags TTY_DRIVER_INSTALLED) return 0; - if (!(driver-flags TTY_DRIVER_DEVPTS_MEM)) { + if (!(driver-flags TTY_DRIVER_DEVPTS_MEM) driver-num) { p = kmalloc(driver-num * 3 * sizeof(void *), GFP_KERNEL); if (!p) return -ENOMEM; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
On Fri, May 04, 2007 at 12:38:58PM +0100, Andy Whitcroft wrote: Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a bit of a challenge. We have been seeing panics on boot from the hvsi driver: Couldn't register hvsi console driver Tracking this back, this seems to come from hvsi driver trying to register itself via tty_register_driver() with a zero units. The failure is triggered by a change in semantics for kmalloc() between SLAB and SLUB; kmalloc(0) now returns NULL rather than an allocation at the smallest size. Looking at the code in question even when the allocation succeeds we will not actually use the memory when device-num is zero. It is not clear to me if this is a bug in the hvsi driver in that it should specify some units. It seems we will try and reserve zero devices in this case, which seems pointless. Yes, it seems pointless to me ... I have tested with the patch below which seems safe to me and stops the errors and even seems to make the console work. But perhaps someone with more driver fu, could verify if driver-num of zero has any meaning and kick this to the hvsi people if not. Hollis nominated me to be hvsi people, although I'm near-totally ignorant of the thing. If hvsi_count is zero, then the device tree did not have any serial nodes that speak hvterm-protocol. The hvsi should not have even tried to register anything. The attached patch seems more to the point. --linas The hvsi driver is used whenever the device-tree contains nodes for serial ports, and those serial ports speak the hvterm protocol. However, if no such nodes are found, then the hvsi driver should not even register. This patch avoids a kernel panic with Couldn't register hvsi console driver. In addition, this patch makes tty_register_driver refuse to do anything, if there are no actual tty ports to be registered. Utterly completely untested. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] drivers/char/hvsi.c |4 drivers/char/tty_io.c |3 +++ 2 files changed, 7 insertions(+) Index: linux-2.6.21-rc7-mm2/drivers/char/hvsi.c === --- linux-2.6.21-rc7-mm2.orig/drivers/char/hvsi.c 2007-04-26 15:37:33.0 -0500 +++ linux-2.6.21-rc7-mm2/drivers/char/hvsi.c2007-05-04 13:55:56.0 -0500 @@ -1148,6 +1148,10 @@ static int __init hvsi_init(void) { int i; + /* No serial hvterm-protocol device-tree nodes found. */ + if (hvsi_count == 0) + return 0; + hvsi_driver = alloc_tty_driver(hvsi_count); if (!hvsi_driver) return -ENOMEM; Index: linux-2.6.21-rc7-mm2/drivers/char/tty_io.c === --- linux-2.6.21-rc7-mm2.orig/drivers/char/tty_io.c 2007-04-26 15:37:33.0 -0500 +++ linux-2.6.21-rc7-mm2/drivers/char/tty_io.c 2007-05-04 13:54:14.0 -0500 @@ -3724,6 +3724,9 @@ int tty_register_driver(struct tty_drive if (driver-flags TTY_DRIVER_INSTALLED) return 0; + if (driver-num == 0) + return -ENODEV; + if (!(driver-flags TTY_DRIVER_DEVPTS_MEM)) { p = kmalloc(driver-num * 3 * sizeof(void *), GFP_KERNEL); if (!p) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
On Fri, 04 May 2007 12:38:58 +0100 Andy Whitcroft [EMAIL PROTECTED] wrote: Trying to get 2.6.21-rc7-mm2 to boot on large PPC64 seems to be a bit of a challenge. We have been seeing panics on boot from the hvsi driver: Couldn't register hvsi console driver Tracking this back, this seems to come from hvsi driver trying to register itself via tty_register_driver() with a zero units. The failure is triggered by a change in semantics for kmalloc() between SLAB and SLUB; kmalloc(0) now returns NULL rather than an allocation at the smallest size. Looking at the code in question even when the allocation succeeds we will not actually use the memory when device-num is zero. OK, thanks for working that out. Christoph, we should be emitting loud warnings so that this problem is easy to debug. Better, we should be emitting loud warnigns which then disable themselves and then succeeding the allocation so that people can proceed with their kernel testing. When all the loud-warning sites have been fixed, we can take that code out again. The present situation is maximally tester-hostile. It is not clear to me if this is a bug in the hvsi driver in that it should specify some units. It seems we will try and reserve zero devices in this case, which seems pointless. I have tested with the patch below which seems safe to me and stops the errors and even seems to make the console work. But perhaps someone with more driver fu, could verify if driver-num of zero has any meaning and kick this to the hvsi people if not. -apw === 8 === tty_register_driver: only allocate tty instances when defined If device-num is zero we attempt to kmalloc() zero bytes. When SLUB is enabled this returns a null pointer and take that as an allocation failure and fail the device register. Check for no devices and avoid the allocation. Signed-off-by: Andy Whitcroft [EMAIL PROTECTED] --- diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c index 959a616..71c4579 100644 --- a/drivers/char/tty_io.c +++ b/drivers/char/tty_io.c @@ -3724,7 +3724,7 @@ int tty_register_driver(struct tty_driver *driver) if (driver-flags TTY_DRIVER_INSTALLED) return 0; - if (!(driver-flags TTY_DRIVER_DEVPTS_MEM)) { + if (!(driver-flags TTY_DRIVER_DEVPTS_MEM) driver-num) { p = kmalloc(driver-num * 3 * sizeof(void *), GFP_KERNEL); if (!p) return -ENOMEM; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.21-rc7-mm2 -- hvsi console driver registration failure
On Fri, 4 May 2007, Andrew Morton wrote: Better, we should be emitting loud warnigns which then disable themselves and then succeeding the allocation so that people can proceed with their kernel testing. When all the loud-warning sites have been fixed, we can take that code out again. The present situation is maximally tester-hostile. i SLUB: Allocate smallest object size if the user asks for 0 bytes. Makes SLUB behave like SLAB in this area to avoid issues Throw a stack dump to alert people. At some point the behavior should be switched back. NULL is no memory as far as I can tell and if the use asked for 0 bytes then he need to get no memory. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/slub_def.h |8 ++-- mm/slub.c|2 +- 2 files changed, 7 insertions(+), 3 deletions(-) Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-05-04 14:17:22.0 -0700 +++ slub/mm/slub.c 2007-05-04 14:19:36.0 -0700 @@ -2009,7 +2009,7 @@ static struct kmem_cache *get_slab(size_ { int index = kmalloc_index(size); - if (!size) + if (!index) return NULL; /* Allocation too large? */ Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-05-04 14:13:40.0 -0700 +++ slub/include/linux/slub_def.h 2007-05-04 14:18:25.0 -0700 @@ -81,8 +81,12 @@ extern struct kmem_cache kmalloc_caches[ */ static inline int kmalloc_index(int size) { - if (size == 0) - return 0; + /* +* We should return 0 if size == 0 but we use the smallest object +* here for SLAB legacy reasons. +*/ + WARN_ON(size == 0); + if (size 64 size = 96) return 1; if (size 128 size = 192) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 11:41:22AM +0200, Tilman Schmidt wrote: > On Wed, 2 May 2007 00:43:05 -0700, "Greg KH" <[EMAIL PROTECTED]> said: > > > > > And the winner is: > > > > > > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > > > > > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > > > > again. > > > > Wait, even though this isn't good, it shouldn't have been hit by anyone, > > that file used to not be readable, so I doubt userspace would have been > > trying to read it... > > > > Tilman, what version of HAL and udev do you have on your machine? > > The ones that came with SuSE 10.0: > > hal-0.5.4-6.4 > udev-068git20050831-9 Ah, ok, that explains it, the really old libsysfs walks and opens all files in sysfs for some odd, strange, and broken reason. This has been fixed in newer versions, and explains why you are seeing this happen. I'll send my fix for this to Linus in a few hours. thanks for testing and tracking this down, I really appreciate it. greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 02.05.2007 22:07 schrieb Andrew Morton: >> Started to git-bisect mainline now, but that will take some time. [...] > I don't think there's much point in you doing that. We know what the bug is. Good. Saves me some work. :-) If you'd like me to test anything, just let me know. Thanks, Tilman -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, 02 May 2007 19:36:03 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > Am 02.05.2007 09:52 schrieb Greg KH: > > Tilman, here's a patch, can you try this on top of your tree that dies? > > 2.6.21-git3 plus that patch comes up fine. > > (Except for a UDP problem I seem to remember I already saw reported > on lkml and which I'll ignore for now in order not to blur the > picture.) Thanks. > Started to git-bisect mainline now, but that will take some time. > It's more than 800 patches to check and I don't get more than 2-3 > iterations per day out of that machine. I don't think there's much point in you doing that. We know what the bug is. Switching to 8k stacks will probably fix things up too. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 02.05.2007 09:52 schrieb Greg KH: > Tilman, here's a patch, can you try this on top of your tree that dies? 2.6.21-git3 plus that patch comes up fine. (Except for a UDP problem I seem to remember I already saw reported on lkml and which I'll ignore for now in order not to blur the picture.) Started to git-bisect mainline now, but that will take some time. It's more than 800 patches to check and I don't get more than 2-3 iterations per day out of that machine. HTH T. > --- > drivers/base/core.c |7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device > struct kobject *top_kobj; > struct kset *kset; > char *envp[32]; > - char data[PAGE_SIZE]; > + char *data = NULL; > char *pos; > int i; > size_t count = 0; > @@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device > if (!kset->uevent_ops->filter(kset, >kobj)) > goto out; > > + data = (char *)get_zeroed_page(GFP_KERNEL); > + if (!data) > + return -ENOMEM; > + > /* let the kset specific function add its keys */ > pos = data; > retval = kset->uevent_ops->uevent(kset, >kobj, > @@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device > count += sprintf(pos, "%s\n", envp[i]); > } > out: > + free_page((unsigned long)data); > return count; > } > -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On 5/2/07, Greg KH <[EMAIL PROTECTED]> wrote: On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: > On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > > > Am 30.04.2007 21:46 schrieb Andrew Morton: > > > Not really - everything's tangled up. A bisection search on the > > > 2.6.21-rc7-mm2 driver tree would be the best bet. > > > > And the winner is: > > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > > again. > > cripes. > > +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct kobject *top_kobj; > + struct kset *kset; > + char *envp[32]; > + char data[PAGE_SIZE]; > > That won't work too well with 4k stacks. Yeah, sorry. Wait, even though this isn't good, it shouldn't have been hit by anyone, that file used to not be readable, so I doubt userspace would have been trying to read it... Tilman, what version of HAL and udev do you have on your machine? Kay, did you get the 'read the uevent file' code already into udev and/or HAL? Only udevtest uses this at the moment, but that is only used for debugging. It's probably the brain-dead libsysfs, which opens and reads every file in /sys, even when nobody is interested in the data. Thanks, Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, 2 May 2007 00:43:05 -0700, "Greg KH" <[EMAIL PROTECTED]> said: > > > And the winner is: > > > > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > > > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > > > again. > > Wait, even though this isn't good, it shouldn't have been hit by anyone, > that file used to not be readable, so I doubt userspace would have been > trying to read it... > > Tilman, what version of HAL and udev do you have on your machine? The ones that came with SuSE 10.0: hal-0.5.4-6.4 udev-068git20050831-9 HTH Tilman PS: I'll test your patch and git-bisect when I'm back at the machine. -- Tilman Schmidt [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Build break on ppc64 for 2.6.21-rc7-mm2
Hi When compiling 2.6.21-rc7-mm2, I encountered this error. = CC [M] drivers/net/e1000/e1000_ethtool.o CC [M] drivers/net/e1000/e1000_main.o LD [M] drivers/net/e1000/e1000.o LD drivers/net/ehea/built-in.o CC [M] drivers/net/ehea/ehea_main.o drivers/net/ehea/ehea_main.c: In function ehea_hash_skb: drivers/net/ehea/ehea_main.c:1806: error: struct sk_buff has no member named nh drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named nh drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named nh drivers/net/ehea/ehea_main.c:1809: error: struct sk_buff has no member named nh make[3]: *** [drivers/net/ehea/ehea_main.o] Error 1 make[2]: *** [drivers/net/ehea] Error 2 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 = Since code is not compatible with struct sk_buff change, we have this error. Below patch should fix this problem. Please let me know your comments on this. Signed-off-by: Srinivasa Ds <[EMAIL PROTECTED]> --- drivers/net/ehea/ehea_main.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c === --- linux-2.6.21-rc7.orig/drivers/net/ehea/ehea_main.c +++ linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c @@ -1803,10 +1803,10 @@ static inline int ehea_hash_skb(struct s u32 tmp; if ((skb->protocol == htons(ETH_P_IP)) && - (skb->nh.iph->protocol == IPPROTO_TCP)) { - tcp = (struct tcphdr*)(skb->nh.raw + (skb->nh.iph->ihl * 4)); + (ip_hdr(skb)->protocol == IPPROTO_TCP)) { + tcp = (struct tcphdr*)(skb_network_header(skb) + (ip_hdr(skb)->ihl * 4)); tmp = (tcp->source + (tcp->dest << 16)) % 31; - tmp += skb->nh.iph->daddr % 31; + tmp += ip_hdr(skb)->daddr % 31; return tmp % num_qps; } else - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: > On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > > > Am 30.04.2007 21:46 schrieb Andrew Morton: > > > Not really - everything's tangled up. A bisection search on the > > > 2.6.21-rc7-mm2 driver tree would be the best bet. > > > > And the winner is: > > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > > again. > > cripes. > > +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct kobject *top_kobj; > + struct kset *kset; > + char *envp[32]; > + char data[PAGE_SIZE]; > > That won't work too well with 4k stacks. Tilman, here's a patch, can you try this on top of your tree that dies? thanks, greg k-h --- drivers/base/core.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device struct kobject *top_kobj; struct kset *kset; char *envp[32]; - char data[PAGE_SIZE]; + char *data = NULL; char *pos; int i; size_t count = 0; @@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device if (!kset->uevent_ops->filter(kset, >kobj)) goto out; + data = (char *)get_zeroed_page(GFP_KERNEL); + if (!data) + return -ENOMEM; + /* let the kset specific function add its keys */ pos = data; retval = kset->uevent_ops->uevent(kset, >kobj, @@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device count += sprintf(pos, "%s\n", envp[i]); } out: + free_page((unsigned long)data); return count; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: > On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > > > Am 30.04.2007 21:46 schrieb Andrew Morton: > > > Not really - everything's tangled up. A bisection search on the > > > 2.6.21-rc7-mm2 driver tree would be the best bet. > > > > And the winner is: > > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > > again. > > cripes. > > +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct kobject *top_kobj; > + struct kset *kset; > + char *envp[32]; > + char data[PAGE_SIZE]; > > That won't work too well with 4k stacks. Wait, even though this isn't good, it shouldn't have been hit by anyone, that file used to not be readable, so I doubt userspace would have been trying to read it... Tilman, what version of HAL and udev do you have on your machine? Kay, did you get the 'read the uevent file' code already into udev and/or HAL? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: > On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > > > Am 30.04.2007 21:46 schrieb Andrew Morton: > > > Not really - everything's tangled up. A bisection search on the > > > 2.6.21-rc7-mm2 driver tree would be the best bet. > > > > And the winner is: > > > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > > again. > > cripes. > > +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct kobject *top_kobj; > + struct kset *kset; > + char *envp[32]; > + char data[PAGE_SIZE]; > > That won't work too well with 4k stacks. Oh crap. Yeah, that's not nice. > Who's reviewing this stuff? The patch headers indicate that no mailing list > was > cc'ed? Kay and I did this, sorry, it should have been cc:ed to lkml. I'll go fix it up now... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > Am 30.04.2007 21:46 schrieb Andrew Morton: > > Not really - everything's tangled up. A bisection search on the > > 2.6.21-rc7-mm2 driver tree would be the best bet. > > And the winner is: > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > again. cripes. +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; That won't work too well with 4k stacks. Who's reviewing this stuff? The patch headers indicate that no mailing list was cc'ed? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Tilman Schmidt wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; + char *pos; + int i; + size_t count = 0; + int retval; ... that seems like a lot of stack to be using. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 09:01:22AM +0200, Tilman Schmidt wrote: > Am 30.04.2007 21:46 schrieb Andrew Morton: > > Not really - everything's tangled up. A bisection search on the > > 2.6.21-rc7-mm2 driver tree would be the best bet. > > And the winner is: > > gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch > > Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel > again. > > I'll try building 2.6.21-git3 minus that one next, but I'll have > to revert it manually, because my naive attempt to "patch -R" it > failed 1 out of 2 hunks. Ok, that's just wierd, it only adds a new feature, it doesn't touch any existing code to cause things to go wrong. Can you try using 'git bisect' on Linus's tree instead? That should show the real problem much easier. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 21:46 schrieb Andrew Morton: > Not really - everything's tangled up. A bisection search on the > 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. I'll try building 2.6.21-git3 minus that one next, but I'll have to revert it manually, because my naive attempt to "patch -R" it failed 1 out of 2 hunks. HTH T. -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. I'll try building 2.6.21-git3 minus that one next, but I'll have to revert it manually, because my naive attempt to patch -R it failed 1 out of 2 hunks. HTH T. -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 09:01:22AM +0200, Tilman Schmidt wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. I'll try building 2.6.21-git3 minus that one next, but I'll have to revert it manually, because my naive attempt to patch -R it failed 1 out of 2 hunks. Ok, that's just wierd, it only adds a new feature, it doesn't touch any existing code to cause things to go wrong. Can you try using 'git bisect' on Linus's tree instead? That should show the real problem much easier. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. cripes. +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; That won't work too well with 4k stacks. Who's reviewing this stuff? The patch headers indicate that no mailing list was cc'ed? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Tilman Schmidt wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; + char *pos; + int i; + size_t count = 0; + int retval; ... that seems like a lot of stack to be using. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. cripes. +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; That won't work too well with 4k stacks. Oh crap. Yeah, that's not nice. Who's reviewing this stuff? The patch headers indicate that no mailing list was cc'ed? Kay and I did this, sorry, it should have been cc:ed to lkml. I'll go fix it up now... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. cripes. +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; That won't work too well with 4k stacks. Wait, even though this isn't good, it shouldn't have been hit by anyone, that file used to not be readable, so I doubt userspace would have been trying to read it... Tilman, what version of HAL and udev do you have on your machine? Kay, did you get the 'read the uevent file' code already into udev and/or HAL? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. cripes. +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; That won't work too well with 4k stacks. Tilman, here's a patch, can you try this on top of your tree that dies? thanks, greg k-h --- drivers/base/core.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device struct kobject *top_kobj; struct kset *kset; char *envp[32]; - char data[PAGE_SIZE]; + char *data = NULL; char *pos; int i; size_t count = 0; @@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device if (!kset-uevent_ops-filter(kset, dev-kobj)) goto out; + data = (char *)get_zeroed_page(GFP_KERNEL); + if (!data) + return -ENOMEM; + /* let the kset specific function add its keys */ pos = data; retval = kset-uevent_ops-uevent(kset, dev-kobj, @@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device count += sprintf(pos, %s\n, envp[i]); } out: + free_page((unsigned long)data); return count; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Build break on ppc64 for 2.6.21-rc7-mm2
Hi When compiling 2.6.21-rc7-mm2, I encountered this error. = CC [M] drivers/net/e1000/e1000_ethtool.o CC [M] drivers/net/e1000/e1000_main.o LD [M] drivers/net/e1000/e1000.o LD drivers/net/ehea/built-in.o CC [M] drivers/net/ehea/ehea_main.o drivers/net/ehea/ehea_main.c: In function ehea_hash_skb: drivers/net/ehea/ehea_main.c:1806: error: struct sk_buff has no member named nh drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named nh drivers/net/ehea/ehea_main.c:1807: error: struct sk_buff has no member named nh drivers/net/ehea/ehea_main.c:1809: error: struct sk_buff has no member named nh make[3]: *** [drivers/net/ehea/ehea_main.o] Error 1 make[2]: *** [drivers/net/ehea] Error 2 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 = Since code is not compatible with struct sk_buff change, we have this error. Below patch should fix this problem. Please let me know your comments on this. Signed-off-by: Srinivasa Ds [EMAIL PROTECTED] --- drivers/net/ehea/ehea_main.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c === --- linux-2.6.21-rc7.orig/drivers/net/ehea/ehea_main.c +++ linux-2.6.21-rc7/drivers/net/ehea/ehea_main.c @@ -1803,10 +1803,10 @@ static inline int ehea_hash_skb(struct s u32 tmp; if ((skb-protocol == htons(ETH_P_IP)) - (skb-nh.iph-protocol == IPPROTO_TCP)) { - tcp = (struct tcphdr*)(skb-nh.raw + (skb-nh.iph-ihl * 4)); + (ip_hdr(skb)-protocol == IPPROTO_TCP)) { + tcp = (struct tcphdr*)(skb_network_header(skb) + (ip_hdr(skb)-ihl * 4)); tmp = (tcp-source + (tcp-dest 16)) % 31; - tmp += skb-nh.iph-daddr % 31; + tmp += ip_hdr(skb)-daddr % 31; return tmp % num_qps; } else - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, 2 May 2007 00:43:05 -0700, Greg KH [EMAIL PROTECTED] said: And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. Wait, even though this isn't good, it shouldn't have been hit by anyone, that file used to not be readable, so I doubt userspace would have been trying to read it... Tilman, what version of HAL and udev do you have on your machine? The ones that came with SuSE 10.0: hal-0.5.4-6.4 udev-068git20050831-9 HTH Tilman PS: I'll test your patch and git-bisect when I'm back at the machine. -- Tilman Schmidt [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On 5/2/07, Greg KH [EMAIL PROTECTED] wrote: On Wed, May 02, 2007 at 12:10:00AM -0700, Andrew Morton wrote: On Wed, 02 May 2007 09:01:22 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. cripes. +static ssize_t show_uevent(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct kobject *top_kobj; + struct kset *kset; + char *envp[32]; + char data[PAGE_SIZE]; That won't work too well with 4k stacks. Yeah, sorry. Wait, even though this isn't good, it shouldn't have been hit by anyone, that file used to not be readable, so I doubt userspace would have been trying to read it... Tilman, what version of HAL and udev do you have on your machine? Kay, did you get the 'read the uevent file' code already into udev and/or HAL? Only udevtest uses this at the moment, but that is only used for debugging. It's probably the brain-dead libsysfs, which opens and reads every file in /sys, even when nobody is interested in the data. Thanks, Kay - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 02.05.2007 09:52 schrieb Greg KH: Tilman, here's a patch, can you try this on top of your tree that dies? 2.6.21-git3 plus that patch comes up fine. (Except for a UDP problem I seem to remember I already saw reported on lkml and which I'll ignore for now in order not to blur the picture.) Started to git-bisect mainline now, but that will take some time. It's more than 800 patches to check and I don't get more than 2-3 iterations per day out of that machine. HTH T. --- drivers/base/core.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device struct kobject *top_kobj; struct kset *kset; char *envp[32]; - char data[PAGE_SIZE]; + char *data = NULL; char *pos; int i; size_t count = 0; @@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device if (!kset-uevent_ops-filter(kset, dev-kobj)) goto out; + data = (char *)get_zeroed_page(GFP_KERNEL); + if (!data) + return -ENOMEM; + /* let the kset specific function add its keys */ pos = data; retval = kset-uevent_ops-uevent(kset, dev-kobj, @@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device count += sprintf(pos, %s\n, envp[i]); } out: + free_page((unsigned long)data); return count; } -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, 02 May 2007 19:36:03 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 02.05.2007 09:52 schrieb Greg KH: Tilman, here's a patch, can you try this on top of your tree that dies? 2.6.21-git3 plus that patch comes up fine. (Except for a UDP problem I seem to remember I already saw reported on lkml and which I'll ignore for now in order not to blur the picture.) Thanks. Started to git-bisect mainline now, but that will take some time. It's more than 800 patches to check and I don't get more than 2-3 iterations per day out of that machine. I don't think there's much point in you doing that. We know what the bug is. Switching to 8k stacks will probably fix things up too. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 02.05.2007 22:07 schrieb Andrew Morton: Started to git-bisect mainline now, but that will take some time. [...] I don't think there's much point in you doing that. We know what the bug is. Good. Saves me some work. :-) If you'd like me to test anything, just let me know. Thanks, Tilman -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Wed, May 02, 2007 at 11:41:22AM +0200, Tilman Schmidt wrote: On Wed, 2 May 2007 00:43:05 -0700, Greg KH [EMAIL PROTECTED] said: And the winner is: gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch Reverting only that from 2.6.21-rc7-mm2 gives me a working kernel again. Wait, even though this isn't good, it shouldn't have been hit by anyone, that file used to not be readable, so I doubt userspace would have been trying to read it... Tilman, what version of HAL and udev do you have on your machine? The ones that came with SuSE 10.0: hal-0.5.4-6.4 udev-068git20050831-9 Ah, ok, that explains it, the really old libsysfs walks and opens all files in sysfs for some odd, strange, and broken reason. This has been fixed in newer versions, and explains why you are seeing this happen. I'll send my fix for this to Linus in a few hours. thanks for testing and tracking this down, I really appreciate it. greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Tue, May 01, 2007 at 01:26:44PM +0200, Tilman Schmidt wrote: > Am 30.04.2007 21:46 schrieb Andrew Morton: > > Sure, but what about 2.6.21-git3 (or, better, current -git)? > > 2.6.21-git3 crashed with panic blink at "scanning usb: .." > (Nothing in the log this time.) Eeek, that's not good. Can you keep bisecting Linus's tree? 'git bisect' makes this very easy to do. We need to track this down as soon as possible if we can. > Will continue bisecting -rc7-mm2. Can you focus on Linus's tree now, as we know that it is the part causing problems? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
Hello Vivek, * Vivek Goyal <[EMAIL PROTECTED]> [2007-04-30 10:48]: > > > handle_edge_irq() already makes sure that desc->action is not null, still > note_interrupt() is receiving desc->action as null, that's strange. On my > system this is happening for irq 4 and /proc/interrupt shows that it is > coming from "serial". from reading the code I also cannot this. However, I'm trying to reproduce the problem here. I hope I find a machine where this also happens. Thanks, Bernhard -- SUSE LINUX Products GmbH Tel. +49 (911) 74053-0 Maxfeldstr. 5 GF: Markus Rex 90409 Nürnberg, Germany HRB 16746 (AG Nürnberg) OpenPGP DDAF6454: F61F 34CC 09CA FB82 C9F6 BA4B 8865 3696 DDAF 6454 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote: > On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: > > > On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: > > > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > > > > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > > > > randomish times (presumably in the timer irq handler) when netconsole > > > > > and > > > > > printk-time are enabled. > > > > > > > > A backtrace would be good. Does nmi_watchdog=2 show anything > > > > interesting or if not sysrq-t? > > > > > > I can't get anything from sysrq or nmi_watchdog. > > > > Hmm, ok when the console locks up those likely don't work. > > > > > > > > > > I was hitting the same thing on i386 uniprocessor, but I thought it > > > > > got > > > > > fixed. > > > > > > > > Yes. > > > > > > Fixed where? Merged into mainline or in your firstfloor patches? > > > > None of the sched-clock changes are in mainline yet. > > > > Can you perhaps test latest firstfloor alone (without rest of -mm)? > > OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or > applied to 2.6.21-rc7-git5 ? Applied cleanly to 2.6.21-rc7-git5, but it has build errors: arch/x86_64/mm/built-in.o: In function `mark_rodata_ro': (.text+0x180): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `mem_init': (.init.text+0x2cf): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `do_page_fault': (.kprobes.text+0x59c): undefined reference to `_stext' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x40): undefined reference to `vdso_end' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x58): undefined reference to `vdso_start' arch/x86_64/vdso/built-in.o: In function `init_vdso_vars': vma.c:(.init.text+0x1b): undefined reference to `vdso_end' vma.c:(.init.text+0x26): undefined reference to `vdso_start' vma.c:(.init.text+0x3c): undefined reference to `vdso_start' kernel/built-in.o: In function `profile_hits': (.text+0x9609): undefined reference to `_stext' kernel/built-in.o: In function `core_kernel_text': (.text+0x197c4): undefined reference to `_stext' kernel/built-in.o: In function `is_ksym_addr': kallsyms.c:(.text+0x27042): undefined reference to `_stext' kernel/built-in.o: In function `profile_init': (.init.text+0xc57): undefined reference to `_stext' make: *** [.tmp_vmlinux1] Error 1 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: > On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: > > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > > > randomish times (presumably in the timer irq handler) when netconsole > > > > and > > > > printk-time are enabled. > > > > > > A backtrace would be good. Does nmi_watchdog=2 show anything > > > interesting or if not sysrq-t? > > > > I can't get anything from sysrq or nmi_watchdog. > > Hmm, ok when the console locks up those likely don't work. > > > > > > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > > > fixed. > > > > > > Yes. > > > > Fixed where? Merged into mainline or in your firstfloor patches? > > None of the sched-clock changes are in mainline yet. > > Can you perhaps test latest firstfloor alone (without rest of -mm)? OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or applied to 2.6.21-rc7-git5 ? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote: > On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > The bug is in firstfloor only, and the fix (if present) will be there too. > > > > > > > > > > > > Nope, > > > > > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share > > > > > > is identical to > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch > > > > Or perhaps the deadlock is in the cpufrequency handler. Does it happen > > without CONFIG_CPUFREQ > > too? > > > > [cpufreq handler calls ktime_get which might take xtime lock for reading] > > > > Sounds right. That's what was happening to me for a while. > > Randy, it'd be interesting to try: > > --- a/arch/x86_64/kernel/tsc.c~a > +++ a/arch/x86_64/kernel/tsc.c > @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct > cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); > > tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); > - if (!(freq->flags & CPUFREQ_CONST_LOOPS)) > - mark_tsc_unstable("cpufreq changes"); > +// if (!(freq->flags & CPUFREQ_CONST_LOOPS)) > +// mark_tsc_unstable("cpufreq changes"); > } > > return 0; > _ I don't have CPU_FREQ enabled, so that didn't change anything. > and if that "fixes" it, disable netconsole and do > > --- a/arch/x86_64/kernel/tsc.c~a > +++ a/arch/x86_64/kernel/tsc.c > @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct > > tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); > if (!(freq->flags & CPUFREQ_CONST_LOOPS)) > - mark_tsc_unstable("cpufreq changes"); > + dump_stack(); > } > > return 0; --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 21:46 schrieb Andrew Morton: > Sure, but what about 2.6.21-git3 (or, better, current -git)? 2.6.21-git3 crashed with panic blink at "scanning usb: .." (Nothing in the log this time.) Will continue bisecting -rc7-mm2. HTH T. -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 21:46 schrieb Andrew Morton: Sure, but what about 2.6.21-git3 (or, better, current -git)? 2.6.21-git3 crashed with panic blink at scanning usb: .. (Nothing in the log this time.) Will continue bisecting -rc7-mm2. HTH T. -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:38:59 -0700 Andrew Morton wrote: On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen [EMAIL PROTECTED] wrote: The bug is in firstfloor only, and the fix (if present) will be there too. checks Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch Or perhaps the deadlock is in the cpufrequency handler. Does it happen without CONFIG_CPUFREQ too? [cpufreq handler calls ktime_get which might take xtime lock for reading] Sounds right. That's what was happening to me for a while. Randy, it'd be interesting to try: --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq-new); tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new); - if (!(freq-flags CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable(cpufreq changes); +// if (!(freq-flags CPUFREQ_CONST_LOOPS)) +// mark_tsc_unstable(cpufreq changes); } return 0; _ I don't have CPU_FREQ enabled, so that didn't change anything. and if that fixes it, disable netconsole and do --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq-new); if (!(freq-flags CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable(cpufreq changes); + dump_stack(); } return 0; --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or applied to 2.6.21-rc7-git5 ? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 09:22:33 -0700 Randy Dunlap wrote: On Tue, 1 May 2007 08:22:58 +0200 Andi Kleen wrote: On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. Yes. Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? OK. so your 2.6.21-rc7-git5 patch, applied to 2.6.21-git4 or applied to 2.6.21-rc7-git5 ? Applied cleanly to 2.6.21-rc7-git5, but it has build errors: arch/x86_64/mm/built-in.o: In function `mark_rodata_ro': (.text+0x180): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `mem_init': (.init.text+0x2cf): undefined reference to `_stext' arch/x86_64/mm/built-in.o: In function `do_page_fault': (.kprobes.text+0x59c): undefined reference to `_stext' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x40): undefined reference to `vdso_end' arch/x86_64/vdso/built-in.o: In function `arch_setup_additional_pages': (.text+0x58): undefined reference to `vdso_start' arch/x86_64/vdso/built-in.o: In function `init_vdso_vars': vma.c:(.init.text+0x1b): undefined reference to `vdso_end' vma.c:(.init.text+0x26): undefined reference to `vdso_start' vma.c:(.init.text+0x3c): undefined reference to `vdso_start' kernel/built-in.o: In function `profile_hits': (.text+0x9609): undefined reference to `_stext' kernel/built-in.o: In function `core_kernel_text': (.text+0x197c4): undefined reference to `_stext' kernel/built-in.o: In function `is_ksym_addr': kallsyms.c:(.text+0x27042): undefined reference to `_stext' kernel/built-in.o: In function `profile_init': (.init.text+0xc57): undefined reference to `_stext' make: *** [.tmp_vmlinux1] Error 1 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 irqpoll seems to be broken
Hello Vivek, * Vivek Goyal [EMAIL PROTECTED] [2007-04-30 10:48]: handle_edge_irq() already makes sure that desc-action is not null, still note_interrupt() is receiving desc-action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from serial. from reading the code I also cannot this. However, I'm trying to reproduce the problem here. I hope I find a machine where this also happens. Thanks, Bernhard -- SUSE LINUX Products GmbH Tel. +49 (911) 74053-0 Maxfeldstr. 5 GF: Markus Rex 90409 Nürnberg, Germany HRB 16746 (AG Nürnberg) OpenPGP DDAF6454: F61F 34CC 09CA FB82 C9F6 BA4B 8865 3696 DDAF 6454 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Tue, May 01, 2007 at 01:26:44PM +0200, Tilman Schmidt wrote: Am 30.04.2007 21:46 schrieb Andrew Morton: Sure, but what about 2.6.21-git3 (or, better, current -git)? 2.6.21-git3 crashed with panic blink at scanning usb: .. (Nothing in the log this time.) Eeek, that's not good. Can you keep bisecting Linus's tree? 'git bisect' makes this very easy to do. We need to track this down as soon as possible if we can. Will continue bisecting -rc7-mm2. Can you focus on Linus's tree now, as we know that it is the part causing problems? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 08:24:56 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > > The bug is in firstfloor only, and the fix (if present) will be there too. > > > > > > > > Nope, > > > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share > > > > is identical to > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch > > Or perhaps the deadlock is in the cpufrequency handler. Does it happen > without CONFIG_CPUFREQ > too? > > [cpufreq handler calls ktime_get which might take xtime lock for reading] > Sounds right. That's what was happening to me for a while. Randy, it'd be interesting to try: --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -84,8 +84,8 @@ static int time_cpufreq_notifier(struct cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); - if (!(freq->flags & CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable("cpufreq changes"); +// if (!(freq->flags & CPUFREQ_CONST_LOOPS)) +// mark_tsc_unstable("cpufreq changes"); } return 0; _ and if that "fixes" it, disable netconsole and do --- a/arch/x86_64/kernel/tsc.c~a +++ a/arch/x86_64/kernel/tsc.c @@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) - mark_tsc_unstable("cpufreq changes"); + dump_stack(); } return 0; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
> The bug is in firstfloor only, and the fix (if present) will be there too. > > > > Nope, > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share > > is identical to > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch Or perhaps the deadlock is in the cpufrequency handler. Does it happen without CONFIG_CPUFREQ too? [cpufreq handler calls ktime_get which might take xtime lock for reading] -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 22:16:24 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > > fixed. > > > > Yes. > > Fixed where? Merged into mainline or in your firstfloor patches? The bug is in firstfloor only, and the fix (if present) will be there too. Nope, ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/sched-clock-share is identical to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/x86_64-mm-sched-clock-share.patch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, Apr 30, 2007 at 10:16:24PM -0700, Randy Dunlap wrote: > On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > > randomish times (presumably in the timer irq handler) when netconsole and > > > printk-time are enabled. > > > > A backtrace would be good. Does nmi_watchdog=2 show anything > > interesting or if not sysrq-t? > > I can't get anything from sysrq or nmi_watchdog. Hmm, ok when the console locks up those likely don't work. > > > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > > fixed. > > > > Yes. > > Fixed where? Merged into mainline or in your firstfloor patches? None of the sched-clock changes are in mainline yet. Can you perhaps test latest firstfloor alone (without rest of -mm)? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Tue, 1 May 2007 05:43:30 +0200 Andi Kleen wrote: > > Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > > randomish times (presumably in the timer irq handler) when netconsole and > > printk-time are enabled. > > A backtrace would be good. Does nmi_watchdog=2 show anything > interesting or if not sysrq-t? I can't get anything from sysrq or nmi_watchdog. > > I was hitting the same thing on i386 uniprocessor, but I thought it got > > fixed. > > Yes. Fixed where? Merged into mainline or in your firstfloor patches? > My current sched_clock does not take any locks anymore and it was removed > from the cpufreq handler too. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
> Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at > randomish times (presumably in the timer irq handler) when netconsole and > printk-time are enabled. A backtrace would be good. Does nmi_watchdog=2 show anything interesting or if not sysrq-t? > > I was hitting the same thing on i386 uniprocessor, but I thought it got > fixed. Yes. My current sched_clock does not take any locks anymore and it was removed from the cpufreq handler too. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 17:45:55 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Mon, 30 Apr 2007 16:51:01 -0700 > > Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > >> On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: > >> > >>> On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > >>> > >>>> On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> > >>>> wrote: > >>>> > >>>>> On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > >>>>> > >>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > >>>>> I'm getting a hang near the end of booting on x86_64 UP. > >>>>> The last initcall_debug function varies. E.g.: > >>>>> > >>>>> 1/ > >>>>> [0.140257] Calling initcall 0x806f2fa8: > >>>>> init_misc_binfmt+0x0/0x3f() > >>>>> [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > >>>>> returned 0. > >>>>> [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > >>>>> init_misc_binfmt+0x0/0x3f() > >>>>> [0.140284] Calling initcall 0x806f2fe7: > >>>>> init_script_binfmt+0x0/0x12() > >>>>> [0.140293] initcall 0x806f2fe7: > >>>>> init_script_binfmt+0x0/0x12() returned 0. > >>>>> [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > >>>>> init_script_binfmt+0x0/0x12() > >>>>> [0.140310] Calling initcall 0x806f2ff9: > >>>>> init_elf_binfmt+0x0/0x12() > >>>>> [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > >>>>> returned 0. > >>>>> [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > >>>>> init_elf_binfmt+0x0/0x12() > >>>>> [0.140335] Calling initcall 0x806f3de9: > >>>>> debugfs_init+0x0/0x4a() > >>>>> [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > >>>>> returned 0. > >>>>> [0.140351] initcall 0x806f3de9 ran for 0 msecs: > >>>>> debugfs_init+0x0/0x4a() > >>>>> > >>>>> 2/ > >>>>> [0.140206] Calling initcall 0x806efeb1: > >>>>> ksysfs_init+0x0/0x29() > >>>>> [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() > >>>>> returned 0. > >>>>> [0.140222] initcall 0x806efeb1 ran for 0 msecs: > >>>>> ksysfs_init+0x0/0x29() > >>>>> [0.140230] Calling initcall 0x806f25be: > >>>>> filelock_init+0x0/0x31() > >>>>> [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > >>>>> returned 0. > >>>>> [0.140249] initcall 0x806f25be ran for 0 msecs: > >>>>> filelock_init+0x0/0x31() > >>>>> [0.140258] Calling initcall 0x806f2fa8: > >>>>> init_misc_binfmt+0x0/0x3f() > >>>>> [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > >>>>> returned 0. > >>>>> [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > >>>>> init_misc_binfmt+0x0/0x3f() > >>>>> [0.140284] Calling initcall 0x806f2fe7: > >>>>> init_script_binfmt+0x0/0x12() > >>>>> [0.140293] initcall 0x806f2fe7: > >>>>> init_script_binfmt+0x0/0x12() returned 0. > >>>>> > >>>> So perhaps it locks during a timer interrupt. > >>>> > >>>>> .config is attached. > >>>>> > >>>>> Any ideas/suggestions? > >>>> Just the usual: nothing from sysrq or NMI watchdog? > >>> Nothing from either of those. I'll jiggle some config options. > >> config option changes didn't help, but removing > >>netconsole= > >> from the kernel command line makes it all happy. :( > > > > argh. > > > >> Do we know of netconsole hang problems? (anyone?) > > > > You have "time" as well? I found on i386 uniproc that time+netconsole > > caused hangs because the printk timestamping code was taking > > xtime_lock for reading inside a write_seqlock. But I though that Andi > > fixed that. Perhaps i386 got fixed but x86_64 did not. > > Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot. Thanks. > > Maybe the patch isn't merged yet? Could be. I don't recall whether Andi's statement was before or after 2.6.21-rc7-mm2 actually. > Now if I can just remember this until the next time that I hit it... Andi: unprocessor x86_64 running rc7-mm2 is hanging early in boot at randomish times (presumably in the timer irq handler) when netconsole and printk-time are enabled. I was hitting the same thing on i386 uniprocessor, but I thought it got fixed. The problem was that the printable string which is newly passed to mark_tsc_unstable() is printed out inside write_seqlock(xtime_lock) but printk timestamping (and perhaps netconsole tx?) want to take xtime_lock for reading, which will hang. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
Andrew Morton wrote: On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole= from the kernel command line makes it all happy. :( argh. Do we know of netconsole hang problems? (anyone?) You have "time" as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. Yes, I have CONFIG_PRINTK_TIME=y and disabling it allows it to boot. Thanks. Maybe the patch isn't merged yet? Now if I can just remember this until the next time that I hit it... -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 16:51:01 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: > > > On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > > > > > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > > > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > > > > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > > > > > > > I'm getting a hang near the end of booting on x86_64 UP. > > > > The last initcall_debug function varies. E.g.: > > > > > > > > 1/ > > > > [0.140257] Calling initcall 0x806f2fa8: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > > returned 0. > > > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140284] Calling initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() > > > > [0.140293] initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() returned 0. > > > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > > > > init_script_binfmt+0x0/0x12() > > > > [0.140310] Calling initcall 0x806f2ff9: > > > > init_elf_binfmt+0x0/0x12() > > > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > > > > returned 0. > > > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > > > > init_elf_binfmt+0x0/0x12() > > > > [0.140335] Calling initcall 0x806f3de9: > > > > debugfs_init+0x0/0x4a() > > > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > > > returned 0. > > > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: > > > > debugfs_init+0x0/0x4a() > > > > > > > > 2/ > > > > [0.140206] Calling initcall 0x806efeb1: > > > > ksysfs_init+0x0/0x29() > > > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > > > returned 0. > > > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: > > > > ksysfs_init+0x0/0x29() > > > > [0.140230] Calling initcall 0x806f25be: > > > > filelock_init+0x0/0x31() > > > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > > > > returned 0. > > > > [0.140249] initcall 0x806f25be ran for 0 msecs: > > > > filelock_init+0x0/0x31() > > > > [0.140258] Calling initcall 0x806f2fa8: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > > returned 0. > > > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > > > > init_misc_binfmt+0x0/0x3f() > > > > [0.140284] Calling initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() > > > > [0.140293] initcall 0x806f2fe7: > > > > init_script_binfmt+0x0/0x12() returned 0. > > > > > > > > > > So perhaps it locks during a timer interrupt. > > > > > > > .config is attached. > > > > > > > > Any ideas/suggestions? > > > > > > Just the usual: nothing from sysrq or NMI watchdog? > > > > Nothing from either of those. I'll jiggle some config options. > > config option changes didn't help, but removing > netconsole= > from the kernel command line makes it all happy. :( argh. > Do we know of netconsole hang problems? (anyone?) You have "time" as well? I found on i386 uniproc that time+netconsole caused hangs because the printk timestamping code was taking xtime_lock for reading inside a write_seqlock. But I though that Andi fixed that. Perhaps i386 got fixed but x86_64 did not. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot (netconsole)
On Mon, 30 Apr 2007 08:16:53 -0700 Randy Dunlap wrote: > On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > > > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > > > > > I'm getting a hang near the end of booting on x86_64 UP. > > > The last initcall_debug function varies. E.g.: > > > > > > 1/ > > > [0.140257] Calling initcall 0x806f2fa8: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > returned 0. > > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140284] Calling initcall 0x806f2fe7: > > > init_script_binfmt+0x0/0x12() > > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() > > > returned 0. > > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > > > init_script_binfmt+0x0/0x12() > > > [0.140310] Calling initcall 0x806f2ff9: > > > init_elf_binfmt+0x0/0x12() > > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > > > returned 0. > > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > > > init_elf_binfmt+0x0/0x12() > > > [0.140335] Calling initcall 0x806f3de9: > > > debugfs_init+0x0/0x4a() > > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > > returned 0. > > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: > > > debugfs_init+0x0/0x4a() > > > > > > 2/ > > > [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > > returned 0. > > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: > > > ksysfs_init+0x0/0x29() > > > [0.140230] Calling initcall 0x806f25be: > > > filelock_init+0x0/0x31() > > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > > > returned 0. > > > [0.140249] initcall 0x806f25be ran for 0 msecs: > > > filelock_init+0x0/0x31() > > > [0.140258] Calling initcall 0x806f2fa8: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > > returned 0. > > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > > > init_misc_binfmt+0x0/0x3f() > > > [0.140284] Calling initcall 0x806f2fe7: > > > init_script_binfmt+0x0/0x12() > > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() > > > returned 0. > > > > > > > So perhaps it locks during a timer interrupt. > > > > > .config is attached. > > > > > > Any ideas/suggestions? > > > > Just the usual: nothing from sysrq or NMI watchdog? > > Nothing from either of those. I'll jiggle some config options. config option changes didn't help, but removing netconsole= from the kernel command line makes it all happy. :( Do we know of netconsole hang problems? (anyone?) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Monday, 30 April 2007 22:52, Dan Kruchinin wrote: > On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > [Please don't drop addresses from the CC list] > > > > On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote: > > > On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > > > > > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: > > > > > Hi all. > > > > > > > > > > There is a problem on my macbook core duo with suspend. > > > > > after suspending when i'm trying to 'wake up' my notebook, it seems > > > > > that it works, but i don't see anything at my monitor. So i have to > > > > > reboot it to continue my work. > > > > > > > > What exactly do you do to suspend? > > > > > > > > Rafael > > > > > > > > > > > > > --- > > > > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at > > > > > kernel/kthread.c:166 kthread_bind() > > > > > Apr 29 23:31:16 midgard kernel: [140594.900870] [] > > > > > _cpu_down+0x16b/0x250 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900893] [] > > > > > disable_nonboot_cpus+0x60/0xf0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900903] [] > > > > > enter_state+0x22a/0x240 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900913] [] > > > > > state_store+0xbd/0xd0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900920] [] > > > > > state_store+0x0/0xd0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900927] [] > > > > > subsys_attr_store+0x29/0x40 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900937] [] > > > > > sysfs_write_file+0xd4/0x160 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900948] [] > > > > > vfs_write+0xa6/0x160 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900958] [] > > > > > sysfs_write_file+0x0/0x160 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900966] [] > > > > > sys_write+0x41/0x70 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900974] [] > > > > > sys_dup2+0xeb/0x120 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900984] [] > > > > > sysenter_past_esp+0x5f/0x85 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900999] > > > > > === > > > > > --- > > > > > > > > > > dmesg output: > > > > > > > > > > > > > > > Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device > > > > > vtcon0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device > > > > > platform > > > > > Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot > > > > > CPUs ... > > > > > Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline > > > > > Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives: > > > > > switching to UP code > > > > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at > > > > > kernel/kthread.c:166 kthread_bind() > > > > > Apr 29 23:31:16 midgard kernel: [140594.900870] [] > > > > > _cpu_down+0x16b/0x250 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900893] [] > > > > > disable_nonboot_cpus+0x60/0xf0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900903] [] > > > > > enter_state+0x22a/0x240 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900913] [] > > > > > state_store+0xbd/0xd0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900920] [] > > > > > state_store+0x0/0xd0 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900927] [] > > > > > subsys_attr_store+0x29/0x40 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900937] [] > > > > > sysfs_write_file+0xd4/0x160 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900948] [] > > > > > vfs_write+0xa6/0x160 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900958] [] > > > > > sysfs_write_file+0x0/0x160 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900966] [] > > > > > sys_write+0x41/0x70 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900974] [] > > > > > sys_dup2+0xeb/0x120 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900984] [] > > > > > sysenter_past_esp+0x5f/0x85 > > > > > Apr 29 23:31:16 midgard kernel: [140594.900999] > > > > > === > > > > > Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down > > > > > Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot > > > > > CPUs ... > > > > > Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives: > > > > > switching to SMP code > > > > > Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor > > > > > 1/1 eip 3000 > > > > > Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1 > > > > > Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay > > > > > using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390) > > > > > Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait > > > > > feature present. > > > > > Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache: > > > > > 32K, L1 D cache: 32K > > > > > Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K > > > > > Apr 29 23:31:16 midgard kernel: [18014366.509160]
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 21:46 schrieb Andrew Morton: >> 2.6.21-final is fine. > > Sure, but what about 2.6.21-git3 (or, better, current -git)? OIC. Sorry for being dense. Will check. >>> If that's OK then we need to pick through the difference between >>> 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And >>> that's a pretty small set. >> I'm not quite sure how to determine that difference. Can you just provide >> me with a list of patches you'd like me to test? > > Not really - everything's tangled up. A bisection search on the > 2.6.21-rc7-mm2 driver tree would be the best bet. Ok. No prob. It'll just take a bit of time. (Compiling a kernel on that machine takes about 4 hours.) I'll be back. :-) -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: [Please don't drop addresses from the CC list] On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote: > On 4/30/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > Hi, > > > > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: > > > Hi all. > > > > > > There is a problem on my macbook core duo with suspend. > > > after suspending when i'm trying to 'wake up' my notebook, it seems > > > that it works, but i don't see anything at my monitor. So i have to > > > reboot it to continue my work. > > > > What exactly do you do to suspend? > > > > Rafael > > > > > > > --- > > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at > > > kernel/kthread.c:166 kthread_bind() > > > Apr 29 23:31:16 midgard kernel: [140594.900870] [] > > > _cpu_down+0x16b/0x250 > > > Apr 29 23:31:16 midgard kernel: [140594.900893] [] > > > disable_nonboot_cpus+0x60/0xf0 > > > Apr 29 23:31:16 midgard kernel: [140594.900903] [] > > > enter_state+0x22a/0x240 > > > Apr 29 23:31:16 midgard kernel: [140594.900913] [] > > > state_store+0xbd/0xd0 > > > Apr 29 23:31:16 midgard kernel: [140594.900920] [] > > > state_store+0x0/0xd0 > > > Apr 29 23:31:16 midgard kernel: [140594.900927] [] > > > subsys_attr_store+0x29/0x40 > > > Apr 29 23:31:16 midgard kernel: [140594.900937] [] > > > sysfs_write_file+0xd4/0x160 > > > Apr 29 23:31:16 midgard kernel: [140594.900948] [] > > > vfs_write+0xa6/0x160 > > > Apr 29 23:31:16 midgard kernel: [140594.900958] [] > > > sysfs_write_file+0x0/0x160 > > > Apr 29 23:31:16 midgard kernel: [140594.900966] [] > > > sys_write+0x41/0x70 > > > Apr 29 23:31:16 midgard kernel: [140594.900974] [] > > > sys_dup2+0xeb/0x120 > > > Apr 29 23:31:16 midgard kernel: [140594.900984] [] > > > sysenter_past_esp+0x5f/0x85 > > > Apr 29 23:31:16 midgard kernel: [140594.900999] === > > > --- > > > > > > dmesg output: > > > > > > > > > Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device vtcon0 > > > Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device platform > > > Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot CPUs ... > > > Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline > > > Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives: > > > switching to UP code > > > Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at > > > kernel/kthread.c:166 kthread_bind() > > > Apr 29 23:31:16 midgard kernel: [140594.900870] [] > > > _cpu_down+0x16b/0x250 > > > Apr 29 23:31:16 midgard kernel: [140594.900893] [] > > > disable_nonboot_cpus+0x60/0xf0 > > > Apr 29 23:31:16 midgard kernel: [140594.900903] [] > > > enter_state+0x22a/0x240 > > > Apr 29 23:31:16 midgard kernel: [140594.900913] [] > > > state_store+0xbd/0xd0 > > > Apr 29 23:31:16 midgard kernel: [140594.900920] [] > > > state_store+0x0/0xd0 > > > Apr 29 23:31:16 midgard kernel: [140594.900927] [] > > > subsys_attr_store+0x29/0x40 > > > Apr 29 23:31:16 midgard kernel: [140594.900937] [] > > > sysfs_write_file+0xd4/0x160 > > > Apr 29 23:31:16 midgard kernel: [140594.900948] [] > > > vfs_write+0xa6/0x160 > > > Apr 29 23:31:16 midgard kernel: [140594.900958] [] > > > sysfs_write_file+0x0/0x160 > > > Apr 29 23:31:16 midgard kernel: [140594.900966] [] > > > sys_write+0x41/0x70 > > > Apr 29 23:31:16 midgard kernel: [140594.900974] [] > > > sys_dup2+0xeb/0x120 > > > Apr 29 23:31:16 midgard kernel: [140594.900984] [] > > > sysenter_past_esp+0x5f/0x85 > > > Apr 29 23:31:16 midgard kernel: [140594.900999] === > > > Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down > > > Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot CPUs ... > > > Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives: > > > switching to SMP code > > > Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 1/1 eip 3000 > > > Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1 > > > Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay > > > using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390) > > > Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait feature present. > > > Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache: > > > 32K, L1 D cache: 32K > > > Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K > > > Apr 29 23:31:16 midgard kernel: [18014366.509160] CPU: Physical Processor ID: 0 > > > Apr 29 23:31:16 midgard kernel: [18014366.509161] CPU: Processor Core ID: 1 > > > Apr 29 23:31:16 midgard kernel: [18014366.509637] CPU1: Intel Genuine > > > Intel(R) CPU1500 @ 2.00GHz stepping 08 > > > Apr 29 23:31:16 midgard kernel: [18014366.509659] checking TSC > > > synchronization [CPU#0 -> CPU#1]: > > > Apr 29 23:31:16 midgard kernel: [18014366.529627] Measured 68812018716 > > > cycles TSC warp between CPUs, turning off TSC clock. > > > Apr 29 23:31:16 midgard
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Mon, 30 Apr 2007 21:28:06 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > Am 30.04.2007 20:21 schrieb Andrew Morton: > > A lot of Greg's driver tree has gone upstream, so please check current > > mainline. > > 2.6.21-final is fine. Sure, but what about 2.6.21-git3 (or, better, current -git)? > > If that's OK then we need to pick through the difference between > > 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And > > that's a pretty small set. > > I'm not quite sure how to determine that difference. Can you just provide > me with a list of patches you'd like me to test? Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. See, 2.6.21-rc7-mm2 had: gregkh-driver-driver-core-fix-device_add-error-path.patch gregkh-driver-driver-core-fix-namespace-issue-with-devices-assigned-to-classes.patch gregkh-driver-dev_printk-and-new-style-class-devices.patch gregkh-driver-driver-core-udev-triggered-device-driver-binding.patch gregkh-driver-driver-core-use-attribute-groups-in-struct-device_type.patch gregkh-driver-named-device_type.patch gregkh-driver-kobject-kobject_shadow_add-cleanup.patch gregkh-driver-driver-core-per-subsystem-multithreaded-probing.patch gregkh-driver-powerpc-make-it-compile-for-multithread-change.patch gregkh-driver-driver-core-don-t-fail-attaching-the-device-if-it-cannot-be-bound.patch gregkh-driver-driver-no-more-wait.patch gregkh-driver-kref-fix-cpu-ordering-with-respect-to-krefs.patch gregkh-driver-driver-core-notify-userspace-of-network-device-renames.patch gregkh-driver-driver-core-suppress-uevents-via-filter.patch gregkh-driver-driver-core-switch-firmware_class-to-uevent_suppress.patch gregkh-driver-uevent-use-add_uevent_var-instead-of-open-coding-it.patch gregkh-driver-driver-core-add-suspend-and-resume-to-struct-device_type.patch gregkh-driver-kobject-kobject_ueventc-collapse-unnecessary-loop-nesting.patch gregkh-driver-kobject-kobject_add-reference-leak.patch gregkh-driver-devices_subsys-rwsem-removal.patch gregkh-driver-scsi-hosts-rwsem-removal.patch gregkh-driver-usb-bus-mutex.patch gregkh-driver-pnp-remove-rwsem-usage.patch gregkh-driver-input-serio-do-not-touch-bus-s-rwsem.patch gregkh-driver-input-gameport-do-not-touch-bus-s-rwsem.patch gregkh-driver-ide-proc-remove-rwsem.patch gregkh-driver-ieee1394-rwsem-removal.patch gregkh-driver-phy-rwsem-removal.patch gregkh-driver-qeth-remove-usage-of-subsys_rwsem.patch gregkh-driver-subsys-rwsem-removal.patch gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch gregkh-driver-sysfs-fix-error-handling-in-binattr-write.patch gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc.patch gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir.patch gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions.patch gregkh-driver-sysfs-add-sysfs_dirent-s_parent.patch gregkh-driver-sysfs-add-sysfs_dirent-s_name.patch gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union.patch gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock.patch gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree.patch gregkh-driver-sysfs-implement-bin_buffer.patch gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect.patch gregkh-driver-sysfs-kill-attribute-file-orphaning.patch gregkh-driver-sysfs-kill-unnecessary-attribute-owner.patch gregkh-driver-sysfs-make-lockdep-ignore-s_active.patch gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd.patch gregkh-driver-sysfs-rename-object_depth-to-sysfs_path_depth-and-make-it-global.patch gregkh-driver-sysfs-reimplement-sysfs_drop_dentry.patch gregkh-driver-sysfs-kill-sysfs_dirent-s_dentry.patch gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch gregkh-driver-driver-core-warn-for-odd-store-uevent-usage.patch gregkh-driver-kobject-comment-and-warning-fixes-to-kobjectc.patch gregkh-driver-the-overdue-removal-of-the-mount-umount-uevents.patch gregkh-driver-debugfs-add-debugfs_create_u64.patch gregkh-driver-bus_add_driver-return-error-for-no-bus.patch gregkh-driver-uio.patch gregkh-driver-uio-documentation.patch gregkh-driver-uio-dummy.patch gregkh-driver-uio-hilscher-cif-card-driver.patch gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch gregkh-driver-put_device-might_sleep.patch gregkh-driver-kobject-warn.patch gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch gregkh-driver-nozomi.patch and Greg's driver tree (as of yesterday, I think) had gregkh-driver-uio.patch gregkh-driver-uio-documentation.patch gregkh-driver-uio-dummy.patch gregkh-driver-uio-hilscher-cif-card-driver.patch gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch gregkh-driver-put_device-might_sleep.patch gregkh-driver-kobject-warn.patch gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch gregkh-driver-nozomi.patch So what has happened (approximately)
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 20:21 schrieb Andrew Morton: > A lot of Greg's driver tree has gone upstream, so please check current > mainline. 2.6.21-final is fine. > If that's OK then we need to pick through the difference between > 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And > that's a pretty small set. I'm not quite sure how to determine that difference. Can you just provide me with a list of patches you'd like me to test? Thanks, Tilman -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Wehrhausweg 66 Fax: +49 228 4299019 53227 Bonn Germany signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Mon, 30 Apr 2007 19:17:02 +0200 Tilman Schmidt <[EMAIL PROTECTED]> wrote: > >> With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly > >> crashes during the SuSE 10.1 startup sequence. When booting to RL5, > >> it panicblinks shortly after the graphical login screen appears. > >> Booting to RL3, it hangs after the startup message: > > I have now bisected this down to the section in the series file between > #GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH > to the CC list. This is rather good news. I was staring at about 200-300 MM patches wondering which one was buggy. Thanks heaps for doing the bisect. Now the main worry is Randy's dead box. A lot of Greg's driver tree has gone upstream, so please check current mainline. If that's OK then we need to pick through the difference between 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And that's a pretty small set. > I'll try bisecting further inside that section (unless > you tell me not to), but it may take some time. > > The exact point during the startup sequence when the crash occurred and > the amount of BUG messages produced varied somewhat during these tests. > The common denominator, and my criterion for the good/bad decisions > during the bisect, was the crash (panic blink) just before completion > of the system startup. > Sometimes there weren't any BUG messages in the log (or perhaps they > just didn't make it to the disk.) Sometimes I just had a couple of the > "sleeping function called from invalid context at mm/slab.c:3054" > ones but no "Eeek! page_mapcount(page) went negative!" one before them. > However, whenever the "Eeek!" did appear it announced "getcfg-interfac" > as the current process and was followed by a few of the "mm/slab.c:3054" > ones. hm, big mess. Could be it was some glitch from Tejun's sysfs changes which are all being extensively redone, so perhaps we'll never hear from it again. Or perhaps we just merged it into mainline. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
>> With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly >> crashes during the SuSE 10.1 startup sequence. When booting to RL5, >> it panicblinks shortly after the graphical login screen appears. >> Booting to RL3, it hangs after the startup message: I have now bisected this down to the section in the series file between #GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH to the CC list. I'll try bisecting further inside that section (unless you tell me not to), but it may take some time. The exact point during the startup sequence when the crash occurred and the amount of BUG messages produced varied somewhat during these tests. The common denominator, and my criterion for the good/bad decisions during the bisect, was the crash (panic blink) just before completion of the system startup. Sometimes there weren't any BUG messages in the log (or perhaps they just didn't make it to the disk.) Sometimes I just had a couple of the "sleeping function called from invalid context at mm/slab.c:3054" ones but no "Eeek! page_mapcount(page) went negative!" one before them. However, whenever the "Eeek!" did appear it announced "getcfg-interfac" as the current process and was followed by a few of the "mm/slab.c:3054" ones. HTH Tilman -- In the long run, we'll all be dead. signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 hangs in boot
On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: > On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ > > > > I'm getting a hang near the end of booting on x86_64 UP. > > The last initcall_debug function varies. E.g.: > > > > 1/ > > [0.140257] Calling initcall 0x806f2fa8: > > init_misc_binfmt+0x0/0x3f() > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > returned 0. > > [0.140275] initcall 0x806f2fa8 ran for 0 msecs: > > init_misc_binfmt+0x0/0x3f() > > [0.140284] Calling initcall 0x806f2fe7: > > init_script_binfmt+0x0/0x12() > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() > > returned 0. > > [0.140302] initcall 0x806f2fe7 ran for 0 msecs: > > init_script_binfmt+0x0/0x12() > > [0.140310] Calling initcall 0x806f2ff9: > > init_elf_binfmt+0x0/0x12() > > [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() > > returned 0. > > [0.140326] initcall 0x806f2ff9 ran for 0 msecs: > > init_elf_binfmt+0x0/0x12() > > [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() > > returned 0. > > [0.140351] initcall 0x806f3de9 ran for 0 msecs: > > debugfs_init+0x0/0x4a() > > > > 2/ > > [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() > > [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned > > 0. > > [0.140222] initcall 0x806efeb1 ran for 0 msecs: > > ksysfs_init+0x0/0x29() > > [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() > > [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() > > returned 0. > > [0.140249] initcall 0x806f25be ran for 0 msecs: > > filelock_init+0x0/0x31() > > [0.140258] Calling initcall 0x806f2fa8: > > init_misc_binfmt+0x0/0x3f() > > [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() > > returned 0. > > [0.140276] initcall 0x806f2fa8 ran for 0 msecs: > > init_misc_binfmt+0x0/0x3f() > > [0.140284] Calling initcall 0x806f2fe7: > > init_script_binfmt+0x0/0x12() > > [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() > > returned 0. > > > > So perhaps it locks during a timer interrupt. > > > .config is attached. > > > > Any ideas/suggestions? > > Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Monday, 30 April 2007 12:05, Gautham R Shenoy wrote: > On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote: > > On Sun, 29 Apr 2007 22:27:44 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> > > wrote: > > > > > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: > > > > Hi all. > > > > > > > > There is a problem on my macbook core duo with suspend. > > > > after suspending when i'm trying to 'wake up' my notebook, it seems > > > > that it works, but i don't see anything at my monitor. So i have to > > > > reboot it to continue my work. > > > > > > What exactly do you do to suspend? > > > > > > > This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING. > > The state should be TASK_INTERRUPTIBLE. That's the state of the thread > 'p' should be in when we do a kthread_bind(p) in _cpu_down(). > > Are you sure about the TASK_RUNNING part ? Well, the WARN_ON() in kernel/kthread.c, line166, is triggering here, so it may be TASK_INTERRUPTIBLE too (should the WARN_ON() trigger in that case)? Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote: > On Sun, 29 Apr 2007 22:27:44 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> > wrote: > > > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: > > > Hi all. > > > > > > There is a problem on my macbook core duo with suspend. > > > after suspending when i'm trying to 'wake up' my notebook, it seems > > > that it works, but i don't see anything at my monitor. So i have to > > > reboot it to continue my work. > > > > What exactly do you do to suspend? > > > > This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING. The state should be TASK_INTERRUPTIBLE. That's the state of the thread 'p' should be in when we do a kthread_bind(p) in _cpu_down(). Are you sure about the TASK_RUNNING part ? > > So I was sent the below, including worrisome changelog. > Ok, it should not be that worrisome! By the time we would be doing kthread_stop(p) in _cpu_down(), 'p' would have been moved over to some other online cpu, due to the migrate_dead_tasks() called in CPU_DEAD handling of migration_call (kernel/sched.c). So we are safe. Anyway, I apologise for causing any worry :-) Thanks and Regards gautham. > > > > From: Gautham R Shenoy <[EMAIL PROTECTED]> > > We are anyway kthread_stop()ping other per-cpu kernel threads after > move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread > as well. > > I just checked with Vatsa if there was any subtle reason why they > had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect > any and I can't see any. So let us just remove the kthread_bind. > > Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]> > Cc: Oleg Nesterov <[EMAIL PROTECTED]> > Cc: "Eric W. Biederman" <[EMAIL PROTECTED]> > Cc: "Rafael J. Wysocki" <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > kernel/cpu.c |4 > 1 files changed, 4 deletions(-) > > diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c > --- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down > +++ a/kernel/cpu.c > @@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu) > /* This actually kills the CPU. */ > __cpu_die(cpu); > > - /* Move it here so it can run. */ > - kthread_bind(p, get_cpu()); > - put_cpu(); > - > /* CPU is completely dead: tell everyone. Too late to complain. */ > if (raw_notifier_call_chain(_chain, CPU_DEAD, hcpu) == NOTIFY_BAD) > BUG(); > _ > > ___ > linux-pm mailing list > [EMAIL PROTECTED] > https://lists.linux-foundation.org/mailman/listinfo/linux-pm -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 "irqpoll" seems to be broken
On Thu, Apr 26, 2007 at 08:24:05AM -0700, Andrew Morton wrote: > On Thu, 26 Apr 2007 15:06:20 +0530 Vivek Goyal <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I am booting 2.6.21-rc7-mm2 on x86_64 box with "irqpoll" command line option > > and it panics. I can reproduce this problem easily on this box. Please > > let me know if serial console output is required. > > > > 2.6.21-rc7 works just fine. So problem seems to be in some -mm patch. > > > > Unable to handle kernel NULL pointer dereference at 0009 RIP: > > [] note_interrupt+0x5d/0x21b > > PGD 1032c5067 PUD 1032c4067 PMD 0 > > Oops: 0000 [1] SMP > > CPU 1 > > Modules linked in: > > Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1 > > RIP: 0010:[] [] > > note_interrupt+0x5d/0x21b > > RSP: 0018:810100cbff08 EFLAGS: 00010002 > > RAX: RBX: 807e2d40 RCX: > > RDX: RSI: 807e2d40 RDI: 0004 > > RBP: 807e2d40 R08: R09: > > R10: 0010 R11: 00a0 R12: 810104192f40 > > R13: 807e2d84 R14: R15: > > FS: () GS:810100854140() knlGS: > > CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > > CR2: 0009 CR3: 0001032c6000 CR4: 06e0 > > Process swapper (pid: 0, threadinfo 810100cb8000, task 810100cb7500) > > Stack: 0004 0004 807e2d40 > > 0004 810104192f40 807e2d84 > > 8025c7c5 810100cb9e98 810100cb9e98 > > Call Trace: > > [] handle_edge_irq+0xf9/0x127 > > [] do_IRQ+0xf1/0x160 > > [] ret_from_intr+0x0/0xa > > [] mwait_idle+0x42/0x45 > > [] cpu_idle+0xbd/0xe0 > > > > > > Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b > > RIP [] note_interrupt+0x5d/0x21b > > RSP > > CR2: 0009 > > Kernel panic - not syncing: Aiee, killing interrupt handler! > > > > hm. I'd be suspecting > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch > and > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch > > But because x86_64 doesn't implement IRQ_PER_CPU it's hard to see how we > got into note_interrupt as a result of that patch. > > Adding the `noirqdebug' boot option would be interesting, perhaps. "noirqdebug" gets rid of the problem. But that also effectively nullifies "irqpoll" parameter. Interestingly on another x86_64 machine this problem does not occur. So something is dependent on hardware. I put some debug statements on note_interrupt() and found that desc->action is a NULL pointer and that's why the problem. Above patch acesses desc->action->flags, hence it ends up accessing a NULL pointer. handle_edge_irq() already makes sure that desc->action is not null, still note_interrupt() is receiving desc->action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from "serial". Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-rc7-mm2] BUG while suspend to ram
Maciej Rutecki pisze: > BUG: at kernel/kthread.c:166 kthread_bind() > [] _cpu_down+0x16c/0x250 > [] disable_nonboot_cpus+0x60/0xf0 > [] pm_suspend_disk+0x177/0x2c0 > [] enter_state+0xb5/0x200 > [] state_store+0xbd/0xd0 > [] state_store+0x0/0xd0 > [] subsys_attr_store+0x29/0x40 > [] sysfs_write_file+0xd4/0x160 > [] vfs_write+0xc1/0x160 > [] sysfs_write_file+0x0/0x160 > [] sys_write+0x41/0x70 > [] sys_dup2+0xd5/0x100 > [] sysenter_past_esp+0x5f/0x85 > [] xfrm_policy_insert+0x210/0x400 > === > > dmesg: > http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/dmesg.txt.gz > lsmod: > http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lsmod.txt.gz > ver_linux: > http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/ver_linux.txt.gz > lspci: > http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lspci.txt.gz > config: > http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/config-2.6.21-rc7-mm2.gz > I use this script: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/suspend_to_disk.sh -- Maciej Rutecki <[EMAIL PROTECTED]> http://www.maciek.unixy.pl smime.p7s Description: S/MIME Cryptographic Signature
Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Sun, 29 Apr 2007 22:27:44 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: > > Hi all. > > > > There is a problem on my macbook core duo with suspend. > > after suspending when i'm trying to 'wake up' my notebook, it seems > > that it works, but i don't see anything at my monitor. So i have to > > reboot it to continue my work. > > What exactly do you do to suspend? > This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING. So I was sent the below, including worrisome changelog. From: Gautham R Shenoy <[EMAIL PROTECTED]> We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]> Cc: Oleg Nesterov <[EMAIL PROTECTED]> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]> Cc: "Rafael J. Wysocki" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- kernel/cpu.c |4 1 files changed, 4 deletions(-) diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c --- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down +++ a/kernel/cpu.c @@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu) /* This actually kills the CPU. */ __cpu_die(cpu); - /* Move it here so it can run. */ - kthread_bind(p, get_cpu()); - put_cpu(); - /* CPU is completely dead: tell everyone. Too late to complain. */ if (raw_notifier_call_chain(_chain, CPU_DEAD, hcpu) == NOTIFY_BAD) BUG(); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-rc7-mm2] BUG while suspend to ram
On Sun, 29 Apr 2007 12:42:43 +0200 Maciej Rutecki <[EMAIL PROTECTED]> wrote: > BUG: at kernel/kthread.c:166 kthread_bind() > [] _cpu_down+0x16c/0x250 > [] disable_nonboot_cpus+0x60/0xf0 > [] pm_suspend_disk+0x177/0x2c0 > [] enter_state+0xb5/0x200 > [] state_store+0xbd/0xd0 > [] state_store+0x0/0xd0 > [] subsys_attr_store+0x29/0x40 > [] sysfs_write_file+0xd4/0x160 > [] vfs_write+0xc1/0x160 > [] sysfs_write_file+0x0/0x160 > [] sys_write+0x41/0x70 > [] sys_dup2+0xd5/0x100 > [] sysenter_past_esp+0x5f/0x85 > [] xfrm_policy_insert+0x210/0x400 > === yup, thanks - the present plan is to remove the kthread_bind() call from _cpu_down(). Although we don't appear to fully undersand why we're removing it, nor why it was added in the first place, which has me worried. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-rc7-mm2] BUG while suspend to ram
On Sun, 29 Apr 2007 12:42:43 +0200 Maciej Rutecki [EMAIL PROTECTED] wrote: BUG: at kernel/kthread.c:166 kthread_bind() [c01465ac] _cpu_down+0x16c/0x250 [c0146890] disable_nonboot_cpus+0x60/0xf0 [c014cd67] pm_suspend_disk+0x177/0x2c0 [c014b645] enter_state+0xb5/0x200 [c014b84d] state_store+0xbd/0xd0 [c014b790] state_store+0x0/0xd0 [c01be189] subsys_attr_store+0x29/0x40 [c01be3a4] sysfs_write_file+0xd4/0x160 [c017b701] vfs_write+0xc1/0x160 [c01be2d0] sysfs_write_file+0x0/0x160 [c017be11] sys_write+0x41/0x70 [c0187355] sys_dup2+0xd5/0x100 [c01040f6] sysenter_past_esp+0x5f/0x85 [c033] xfrm_policy_insert+0x210/0x400 === yup, thanks - the present plan is to remove the kthread_bind() call from _cpu_down(). Although we don't appear to fully undersand why we're removing it, nor why it was added in the first place, which has me worried. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Sun, 29 Apr 2007 22:27:44 +0200 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: Hi all. There is a problem on my macbook core duo with suspend. after suspending when i'm trying to 'wake up' my notebook, it seems that it works, but i don't see anything at my monitor. So i have to reboot it to continue my work. What exactly do you do to suspend? This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING. So I was sent the below, including worrisome changelog. From: Gautham R Shenoy [EMAIL PROTECTED] We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy [EMAIL PROTECTED] Cc: Oleg Nesterov [EMAIL PROTECTED] Cc: Eric W. Biederman [EMAIL PROTECTED] Cc: Rafael J. Wysocki [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- kernel/cpu.c |4 1 files changed, 4 deletions(-) diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c --- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down +++ a/kernel/cpu.c @@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu) /* This actually kills the CPU. */ __cpu_die(cpu); - /* Move it here so it can run. */ - kthread_bind(p, get_cpu()); - put_cpu(); - /* CPU is completely dead: tell everyone. Too late to complain. */ if (raw_notifier_call_chain(cpu_chain, CPU_DEAD, hcpu) == NOTIFY_BAD) BUG(); _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21-rc7-mm2] BUG while suspend to ram
Maciej Rutecki pisze: BUG: at kernel/kthread.c:166 kthread_bind() [c01465ac] _cpu_down+0x16c/0x250 [c0146890] disable_nonboot_cpus+0x60/0xf0 [c014cd67] pm_suspend_disk+0x177/0x2c0 [c014b645] enter_state+0xb5/0x200 [c014b84d] state_store+0xbd/0xd0 [c014b790] state_store+0x0/0xd0 [c01be189] subsys_attr_store+0x29/0x40 [c01be3a4] sysfs_write_file+0xd4/0x160 [c017b701] vfs_write+0xc1/0x160 [c01be2d0] sysfs_write_file+0x0/0x160 [c017be11] sys_write+0x41/0x70 [c0187355] sys_dup2+0xd5/0x100 [c01040f6] sysenter_past_esp+0x5f/0x85 [c033] xfrm_policy_insert+0x210/0x400 === dmesg: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/dmesg.txt.gz lsmod: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lsmod.txt.gz ver_linux: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/ver_linux.txt.gz lspci: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lspci.txt.gz config: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/config-2.6.21-rc7-mm2.gz I use this script: http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/suspend_to_disk.sh -- Maciej Rutecki [EMAIL PROTECTED] http://www.maciek.unixy.pl smime.p7s Description: S/MIME Cryptographic Signature
Re: 2.6.21-rc7-mm2 irqpoll seems to be broken
On Thu, Apr 26, 2007 at 08:24:05AM -0700, Andrew Morton wrote: On Thu, 26 Apr 2007 15:06:20 +0530 Vivek Goyal [EMAIL PROTECTED] wrote: Hi, I am booting 2.6.21-rc7-mm2 on x86_64 box with irqpoll command line option and it panics. I can reproduce this problem easily on this box. Please let me know if serial console output is required. 2.6.21-rc7 works just fine. So problem seems to be in some -mm patch. Unable to handle kernel NULL pointer dereference at 0009 RIP: [8025bc5e] note_interrupt+0x5d/0x21b PGD 1032c5067 PUD 1032c4067 PMD 0 Oops: [1] SMP CPU 1 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.21-rc7-mm2 #1 RIP: 0010:[8025bc5e] [8025bc5e] note_interrupt+0x5d/0x21b RSP: 0018:810100cbff08 EFLAGS: 00010002 RAX: RBX: 807e2d40 RCX: RDX: RSI: 807e2d40 RDI: 0004 RBP: 807e2d40 R08: R09: R10: 0010 R11: 00a0 R12: 810104192f40 R13: 807e2d84 R14: R15: FS: () GS:810100854140() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 0009 CR3: 0001032c6000 CR4: 06e0 Process swapper (pid: 0, threadinfo 810100cb8000, task 810100cb7500) Stack: 0004 0004 807e2d40 0004 810104192f40 807e2d84 8025c7c5 810100cb9e98 810100cb9e98 Call Trace: [8025c7c5] handle_edge_irq+0xf9/0x127 [8020c2f9] do_IRQ+0xf1/0x160 [8020a141] ret_from_intr+0x0/0xa [80208fd9] mwait_idle+0x42/0x45 [80208f2f] cpu_idle+0xbd/0xe0 Code: f6 40 09 10 75 09 45 85 ff 0f 85 3d 01 00 00 49 c7 c4 c0 2b RIP [8025bc5e] note_interrupt+0x5d/0x21b RSP 810100cbff08 CR2: 0009 Kernel panic - not syncing: Aiee, killing interrupt handler! hm. I'd be suspecting ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-common-code.patch and ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/add-irqf_irqpoll-flag-on-x86_64.patch But because x86_64 doesn't implement IRQ_PER_CPU it's hard to see how we got into note_interrupt as a result of that patch. Adding the `noirqdebug' boot option would be interesting, perhaps. noirqdebug gets rid of the problem. But that also effectively nullifies irqpoll parameter. Interestingly on another x86_64 machine this problem does not occur. So something is dependent on hardware. I put some debug statements on note_interrupt() and found that desc-action is a NULL pointer and that's why the problem. Above patch acesses desc-action-flags, hence it ends up accessing a NULL pointer. handle_edge_irq() already makes sure that desc-action is not null, still note_interrupt() is receiving desc-action as null, that's strange. On my system this is happening for irq 4 and /proc/interrupt shows that it is coming from serial. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote: On Sun, 29 Apr 2007 22:27:44 +0200 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: Hi all. There is a problem on my macbook core duo with suspend. after suspending when i'm trying to 'wake up' my notebook, it seems that it works, but i don't see anything at my monitor. So i have to reboot it to continue my work. What exactly do you do to suspend? This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING. The state should be TASK_INTERRUPTIBLE. That's the state of the thread 'p' should be in when we do a kthread_bind(p) in _cpu_down(). Are you sure about the TASK_RUNNING part ? So I was sent the below, including worrisome changelog. Ok, it should not be that worrisome! By the time we would be doing kthread_stop(p) in _cpu_down(), 'p' would have been moved over to some other online cpu, due to the migrate_dead_tasks() called in CPU_DEAD handling of migration_call (kernel/sched.c). So we are safe. Anyway, I apologise for causing any worry :-) Thanks and Regards gautham. From: Gautham R Shenoy [EMAIL PROTECTED] We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy [EMAIL PROTECTED] Cc: Oleg Nesterov [EMAIL PROTECTED] Cc: Eric W. Biederman [EMAIL PROTECTED] Cc: Rafael J. Wysocki [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- kernel/cpu.c |4 1 files changed, 4 deletions(-) diff -puN kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down kernel/cpu.c --- a/kernel/cpu.c~remvoe-kthread_bind-call-from-_cpu_down +++ a/kernel/cpu.c @@ -175,10 +175,6 @@ static int _cpu_down(unsigned int cpu) /* This actually kills the CPU. */ __cpu_die(cpu); - /* Move it here so it can run. */ - kthread_bind(p, get_cpu()); - put_cpu(); - /* CPU is completely dead: tell everyone. Too late to complain. */ if (raw_notifier_call_chain(cpu_chain, CPU_DEAD, hcpu) == NOTIFY_BAD) BUG(); _ ___ linux-pm mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/linux-pm -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Monday, 30 April 2007 12:05, Gautham R Shenoy wrote: On Mon, Apr 30, 2007 at 12:39:46AM -0700, Andrew Morton wrote: On Sun, 29 Apr 2007 22:27:44 +0200 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: Hi all. There is a problem on my macbook core duo with suspend. after suspending when i'm trying to 'wake up' my notebook, it seems that it works, but i don't see anything at my monitor. So i have to reboot it to continue my work. What exactly do you do to suspend? This is due to _cpu_down() calling kthread_bind() in state TASK_RUNNING. The state should be TASK_INTERRUPTIBLE. That's the state of the thread 'p' should be in when we do a kthread_bind(p) in _cpu_down(). Are you sure about the TASK_RUNNING part ? Well, the WARN_ON() in kernel/kthread.c, line166, is triggering here, so it may be TASK_INTERRUPTIBLE too (should the WARN_ON() trigger in that case)? Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 hangs in boot
On Sun, 29 Apr 2007 22:23:54 -0700 Andrew Morton wrote: On Sun, 29 Apr 2007 22:01:32 -0700 Randy Dunlap [EMAIL PROTECTED] wrote: On Wed, 25 Apr 2007 22:57:16 -0700 Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ I'm getting a hang near the end of booting on x86_64 UP. The last initcall_debug function varies. E.g.: 1/ [0.140257] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140275] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. [0.140302] initcall 0x806f2fe7 ran for 0 msecs: init_script_binfmt+0x0/0x12() [0.140310] Calling initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() [0.140317] initcall 0x806f2ff9: init_elf_binfmt+0x0/0x12() returned 0. [0.140326] initcall 0x806f2ff9 ran for 0 msecs: init_elf_binfmt+0x0/0x12() [0.140335] Calling initcall 0x806f3de9: debugfs_init+0x0/0x4a() [0.140344] initcall 0x806f3de9: debugfs_init+0x0/0x4a() returned 0. [0.140351] initcall 0x806f3de9 ran for 0 msecs: debugfs_init+0x0/0x4a() 2/ [0.140206] Calling initcall 0x806efeb1: ksysfs_init+0x0/0x29() [0.140215] initcall 0x806efeb1: ksysfs_init+0x0/0x29() returned 0. [0.140222] initcall 0x806efeb1 ran for 0 msecs: ksysfs_init+0x0/0x29() [0.140230] Calling initcall 0x806f25be: filelock_init+0x0/0x31() [0.140242] initcall 0x806f25be: filelock_init+0x0/0x31() returned 0. [0.140249] initcall 0x806f25be ran for 0 msecs: filelock_init+0x0/0x31() [0.140258] Calling initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() [0.140266] initcall 0x806f2fa8: init_misc_binfmt+0x0/0x3f() returned 0. [0.140276] initcall 0x806f2fa8 ran for 0 msecs: init_misc_binfmt+0x0/0x3f() [0.140284] Calling initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() [0.140293] initcall 0x806f2fe7: init_script_binfmt+0x0/0x12() returned 0. So perhaps it locks during a timer interrupt. .config is attached. Any ideas/suggestions? Just the usual: nothing from sysrq or NMI watchdog? Nothing from either of those. I'll jiggle some config options. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly crashes during the SuSE 10.1 startup sequence. When booting to RL5, it panicblinks shortly after the graphical login screen appears. Booting to RL3, it hangs after the startup message: I have now bisected this down to the section in the series file between #GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH to the CC list. I'll try bisecting further inside that section (unless you tell me not to), but it may take some time. The exact point during the startup sequence when the crash occurred and the amount of BUG messages produced varied somewhat during these tests. The common denominator, and my criterion for the good/bad decisions during the bisect, was the crash (panic blink) just before completion of the system startup. Sometimes there weren't any BUG messages in the log (or perhaps they just didn't make it to the disk.) Sometimes I just had a couple of the sleeping function called from invalid context at mm/slab.c:3054 ones but no Eeek! page_mapcount(page) went negative! one before them. However, whenever the Eeek! did appear it announced getcfg-interfac as the current process and was followed by a few of the mm/slab.c:3054 ones. HTH Tilman -- In the long run, we'll all be dead. signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Mon, 30 Apr 2007 19:17:02 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: With kernel 2.6.21-rc7-mm2, my Dell Optiplex GX110 (P3/933) regularly crashes during the SuSE 10.1 startup sequence. When booting to RL5, it panicblinks shortly after the graphical login screen appears. Booting to RL3, it hangs after the startup message: I have now bisected this down to the section in the series file between #GREGKH-DRIVER-START and #GREGKH-DRIVER-END, and therefore added GregKH to the CC list. This is rather good news. I was staring at about 200-300 MM patches wondering which one was buggy. Thanks heaps for doing the bisect. Now the main worry is Randy's dead box. A lot of Greg's driver tree has gone upstream, so please check current mainline. If that's OK then we need to pick through the difference between 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And that's a pretty small set. I'll try bisecting further inside that section (unless you tell me not to), but it may take some time. The exact point during the startup sequence when the crash occurred and the amount of BUG messages produced varied somewhat during these tests. The common denominator, and my criterion for the good/bad decisions during the bisect, was the crash (panic blink) just before completion of the system startup. Sometimes there weren't any BUG messages in the log (or perhaps they just didn't make it to the disk.) Sometimes I just had a couple of the sleeping function called from invalid context at mm/slab.c:3054 ones but no Eeek! page_mapcount(page) went negative! one before them. However, whenever the Eeek! did appear it announced getcfg-interfac as the current process and was followed by a few of the mm/slab.c:3054 ones. hm, big mess. Could be it was some glitch from Tejun's sysfs changes which are all being extensively redone, so perhaps we'll never hear from it again. Or perhaps we just merged it into mainline. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 20:21 schrieb Andrew Morton: A lot of Greg's driver tree has gone upstream, so please check current mainline. 2.6.21-final is fine. If that's OK then we need to pick through the difference between 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And that's a pretty small set. I'm not quite sure how to determine that difference. Can you just provide me with a list of patches you'd like me to test? Thanks, Tilman -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Wehrhausweg 66 Fax: +49 228 4299019 53227 Bonn Germany signature.asc Description: OpenPGP digital signature
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
On Mon, 30 Apr 2007 21:28:06 +0200 Tilman Schmidt [EMAIL PROTECTED] wrote: Am 30.04.2007 20:21 schrieb Andrew Morton: A lot of Greg's driver tree has gone upstream, so please check current mainline. 2.6.21-final is fine. Sure, but what about 2.6.21-git3 (or, better, current -git)? If that's OK then we need to pick through the difference between 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And that's a pretty small set. I'm not quite sure how to determine that difference. Can you just provide me with a list of patches you'd like me to test? Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. See, 2.6.21-rc7-mm2 had: gregkh-driver-driver-core-fix-device_add-error-path.patch gregkh-driver-driver-core-fix-namespace-issue-with-devices-assigned-to-classes.patch gregkh-driver-dev_printk-and-new-style-class-devices.patch gregkh-driver-driver-core-udev-triggered-device-driver-binding.patch gregkh-driver-driver-core-use-attribute-groups-in-struct-device_type.patch gregkh-driver-named-device_type.patch gregkh-driver-kobject-kobject_shadow_add-cleanup.patch gregkh-driver-driver-core-per-subsystem-multithreaded-probing.patch gregkh-driver-powerpc-make-it-compile-for-multithread-change.patch gregkh-driver-driver-core-don-t-fail-attaching-the-device-if-it-cannot-be-bound.patch gregkh-driver-driver-no-more-wait.patch gregkh-driver-kref-fix-cpu-ordering-with-respect-to-krefs.patch gregkh-driver-driver-core-notify-userspace-of-network-device-renames.patch gregkh-driver-driver-core-suppress-uevents-via-filter.patch gregkh-driver-driver-core-switch-firmware_class-to-uevent_suppress.patch gregkh-driver-uevent-use-add_uevent_var-instead-of-open-coding-it.patch gregkh-driver-driver-core-add-suspend-and-resume-to-struct-device_type.patch gregkh-driver-kobject-kobject_ueventc-collapse-unnecessary-loop-nesting.patch gregkh-driver-kobject-kobject_add-reference-leak.patch gregkh-driver-devices_subsys-rwsem-removal.patch gregkh-driver-scsi-hosts-rwsem-removal.patch gregkh-driver-usb-bus-mutex.patch gregkh-driver-pnp-remove-rwsem-usage.patch gregkh-driver-input-serio-do-not-touch-bus-s-rwsem.patch gregkh-driver-input-gameport-do-not-touch-bus-s-rwsem.patch gregkh-driver-ide-proc-remove-rwsem.patch gregkh-driver-ieee1394-rwsem-removal.patch gregkh-driver-phy-rwsem-removal.patch gregkh-driver-qeth-remove-usage-of-subsys_rwsem.patch gregkh-driver-subsys-rwsem-removal.patch gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch gregkh-driver-sysfs-fix-error-handling-in-binattr-write.patch gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc.patch gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir.patch gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions.patch gregkh-driver-sysfs-add-sysfs_dirent-s_parent.patch gregkh-driver-sysfs-add-sysfs_dirent-s_name.patch gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union.patch gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock.patch gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree.patch gregkh-driver-sysfs-implement-bin_buffer.patch gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect.patch gregkh-driver-sysfs-kill-attribute-file-orphaning.patch gregkh-driver-sysfs-kill-unnecessary-attribute-owner.patch gregkh-driver-sysfs-make-lockdep-ignore-s_active.patch gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd.patch gregkh-driver-sysfs-rename-object_depth-to-sysfs_path_depth-and-make-it-global.patch gregkh-driver-sysfs-reimplement-sysfs_drop_dentry.patch gregkh-driver-sysfs-kill-sysfs_dirent-s_dentry.patch gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch gregkh-driver-driver-core-warn-for-odd-store-uevent-usage.patch gregkh-driver-kobject-comment-and-warning-fixes-to-kobjectc.patch gregkh-driver-the-overdue-removal-of-the-mount-umount-uevents.patch gregkh-driver-debugfs-add-debugfs_create_u64.patch gregkh-driver-bus_add_driver-return-error-for-no-bus.patch gregkh-driver-uio.patch gregkh-driver-uio-documentation.patch gregkh-driver-uio-dummy.patch gregkh-driver-uio-hilscher-cif-card-driver.patch gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch gregkh-driver-put_device-might_sleep.patch gregkh-driver-kobject-warn.patch gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch gregkh-driver-nozomi.patch and Greg's driver tree (as of yesterday, I think) had gregkh-driver-uio.patch gregkh-driver-uio-documentation.patch gregkh-driver-uio-dummy.patch gregkh-driver-uio-hilscher-cif-card-driver.patch gregkh-driver-remove-struct-subsystem-as-it-is-no-longer-needed.patch gregkh-driver-put_device-might_sleep.patch gregkh-driver-kobject-warn.patch gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch gregkh-driver-nozomi.patch So what has happened (approximately) is that - the above nine patches have been held back, or are new
Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote: [Please don't drop addresses from the CC list] On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote: On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote: Hi, On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: Hi all. There is a problem on my macbook core duo with suspend. after suspending when i'm trying to 'wake up' my notebook, it seems that it works, but i don't see anything at my monitor. So i have to reboot it to continue my work. What exactly do you do to suspend? Rafael --- Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at kernel/kthread.c:166 kthread_bind() Apr 29 23:31:16 midgard kernel: [140594.900870] [c0142c9b] _cpu_down+0x16b/0x250 Apr 29 23:31:16 midgard kernel: [140594.900893] [c0142f80] disable_nonboot_cpus+0x60/0xf0 Apr 29 23:31:16 midgard kernel: [140594.900903] [c0147efa] enter_state+0x22a/0x240 Apr 29 23:31:16 midgard kernel: [140594.900913] [c0147fcd] state_store+0xbd/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900920] [c0147f10] state_store+0x0/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900927] [c01c1559] subsys_attr_store+0x29/0x40 Apr 29 23:31:16 midgard kernel: [140594.900937] [c01c1774] sysfs_write_file+0xd4/0x160 Apr 29 23:31:16 midgard kernel: [140594.900948] [c0180eb6] vfs_write+0xa6/0x160 Apr 29 23:31:16 midgard kernel: [140594.900958] [c01c16a0] sysfs_write_file+0x0/0x160 Apr 29 23:31:16 midgard kernel: [140594.900966] [c0181601] sys_write+0x41/0x70 Apr 29 23:31:16 midgard kernel: [140594.900974] [c018c70b] sys_dup2+0xeb/0x120 Apr 29 23:31:16 midgard kernel: [140594.900984] [c0104116] sysenter_past_esp+0x5f/0x85 Apr 29 23:31:16 midgard kernel: [140594.900999] === --- dmesg output: Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device vtcon0 Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device platform Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot CPUs ... Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives: switching to UP code Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at kernel/kthread.c:166 kthread_bind() Apr 29 23:31:16 midgard kernel: [140594.900870] [c0142c9b] _cpu_down+0x16b/0x250 Apr 29 23:31:16 midgard kernel: [140594.900893] [c0142f80] disable_nonboot_cpus+0x60/0xf0 Apr 29 23:31:16 midgard kernel: [140594.900903] [c0147efa] enter_state+0x22a/0x240 Apr 29 23:31:16 midgard kernel: [140594.900913] [c0147fcd] state_store+0xbd/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900920] [c0147f10] state_store+0x0/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900927] [c01c1559] subsys_attr_store+0x29/0x40 Apr 29 23:31:16 midgard kernel: [140594.900937] [c01c1774] sysfs_write_file+0xd4/0x160 Apr 29 23:31:16 midgard kernel: [140594.900948] [c0180eb6] vfs_write+0xa6/0x160 Apr 29 23:31:16 midgard kernel: [140594.900958] [c01c16a0] sysfs_write_file+0x0/0x160 Apr 29 23:31:16 midgard kernel: [140594.900966] [c0181601] sys_write+0x41/0x70 Apr 29 23:31:16 midgard kernel: [140594.900974] [c018c70b] sys_dup2+0xeb/0x120 Apr 29 23:31:16 midgard kernel: [140594.900984] [c0104116] sysenter_past_esp+0x5f/0x85 Apr 29 23:31:16 midgard kernel: [140594.900999] === Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot CPUs ... Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives: switching to SMP code Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 1/1 eip 3000 Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1 Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390) Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait feature present. Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache: 32K, L1 D cache: 32K Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K Apr 29 23:31:16 midgard kernel: [18014366.509160] CPU: Physical Processor ID: 0 Apr 29 23:31:16 midgard kernel: [18014366.509161] CPU: Processor Core ID: 1 Apr 29 23:31:16 midgard kernel: [18014366.509637] CPU1: Intel Genuine Intel(R) CPU1500 @ 2.00GHz stepping 08 Apr 29 23:31:16 midgard kernel: [18014366.509659] checking TSC synchronization [CPU#0 - CPU#1]: Apr 29 23:31:16 midgard kernel: [18014366.529627] Measured 68812018716 cycles TSC warp between CPUs, turning off TSC clock. Apr 29 23:31:16 midgard kernel: [18014366.529630] Marking TSC unstable due to: check_tsc_sync_source failed. Apr 29 23:31:16 midgard
Re: 2.6.21-rc7-mm2 suspend bug. [kernel/kthread.c]
On Monday, 30 April 2007 22:52, Dan Kruchinin wrote: On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote: [Please don't drop addresses from the CC list] On Sunday, 29 April 2007 22:46, Dan Kruchinin wrote: On 4/30/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote: Hi, On Sunday, 29 April 2007 21:51, Dan Kruchinin wrote: Hi all. There is a problem on my macbook core duo with suspend. after suspending when i'm trying to 'wake up' my notebook, it seems that it works, but i don't see anything at my monitor. So i have to reboot it to continue my work. What exactly do you do to suspend? Rafael --- Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at kernel/kthread.c:166 kthread_bind() Apr 29 23:31:16 midgard kernel: [140594.900870] [c0142c9b] _cpu_down+0x16b/0x250 Apr 29 23:31:16 midgard kernel: [140594.900893] [c0142f80] disable_nonboot_cpus+0x60/0xf0 Apr 29 23:31:16 midgard kernel: [140594.900903] [c0147efa] enter_state+0x22a/0x240 Apr 29 23:31:16 midgard kernel: [140594.900913] [c0147fcd] state_store+0xbd/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900920] [c0147f10] state_store+0x0/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900927] [c01c1559] subsys_attr_store+0x29/0x40 Apr 29 23:31:16 midgard kernel: [140594.900937] [c01c1774] sysfs_write_file+0xd4/0x160 Apr 29 23:31:16 midgard kernel: [140594.900948] [c0180eb6] vfs_write+0xa6/0x160 Apr 29 23:31:16 midgard kernel: [140594.900958] [c01c16a0] sysfs_write_file+0x0/0x160 Apr 29 23:31:16 midgard kernel: [140594.900966] [c0181601] sys_write+0x41/0x70 Apr 29 23:31:16 midgard kernel: [140594.900974] [c018c70b] sys_dup2+0xeb/0x120 Apr 29 23:31:16 midgard kernel: [140594.900984] [c0104116] sysenter_past_esp+0x5f/0x85 Apr 29 23:31:16 midgard kernel: [140594.900999] === --- dmesg output: Apr 29 23:31:16 midgard kernel: [140594.788697] Suspending device vtcon0 Apr 29 23:31:16 midgard kernel: [140594.788700] Suspending device platform Apr 29 23:31:16 midgard kernel: [140594.788704] Disabling non-boot CPUs ... Apr 29 23:31:16 midgard kernel: [140594.900464] CPU 1 is now offline Apr 29 23:31:16 midgard kernel: [140594.900469] SMP alternatives: switching to UP code Apr 29 23:31:16 midgard kernel: [140594.900856] BUG: at kernel/kthread.c:166 kthread_bind() Apr 29 23:31:16 midgard kernel: [140594.900870] [c0142c9b] _cpu_down+0x16b/0x250 Apr 29 23:31:16 midgard kernel: [140594.900893] [c0142f80] disable_nonboot_cpus+0x60/0xf0 Apr 29 23:31:16 midgard kernel: [140594.900903] [c0147efa] enter_state+0x22a/0x240 Apr 29 23:31:16 midgard kernel: [140594.900913] [c0147fcd] state_store+0xbd/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900920] [c0147f10] state_store+0x0/0xd0 Apr 29 23:31:16 midgard kernel: [140594.900927] [c01c1559] subsys_attr_store+0x29/0x40 Apr 29 23:31:16 midgard kernel: [140594.900937] [c01c1774] sysfs_write_file+0xd4/0x160 Apr 29 23:31:16 midgard kernel: [140594.900948] [c0180eb6] vfs_write+0xa6/0x160 Apr 29 23:31:16 midgard kernel: [140594.900958] [c01c16a0] sysfs_write_file+0x0/0x160 Apr 29 23:31:16 midgard kernel: [140594.900966] [c0181601] sys_write+0x41/0x70 Apr 29 23:31:16 midgard kernel: [140594.900974] [c018c70b] sys_dup2+0xeb/0x120 Apr 29 23:31:16 midgard kernel: [140594.900984] [c0104116] sysenter_past_esp+0x5f/0x85 Apr 29 23:31:16 midgard kernel: [140594.900999] === Apr 29 23:31:16 midgard kernel: [140594.902843] CPU1 is down Apr 29 23:31:16 midgard kernel: [18014366.415769] Enabling non-boot CPUs ... Apr 29 23:31:16 midgard kernel: [18014366.426999] SMP alternatives: switching to SMP code Apr 29 23:31:16 midgard kernel: [18014366.427165] Booting processor 1/1 eip 3000 Apr 29 23:31:16 midgard kernel: [18014366.436913] Initializing CPU#1 Apr 29 23:31:16 midgard kernel: [18014366.509141] Calibrating delay using timer specific routine.. 3994.69 BogoMIPS (lpj=7989390) Apr 29 23:31:16 midgard kernel: [18014366.509152] monitor/mwait feature present. Apr 29 23:31:16 midgard kernel: [18014366.509156] CPU: L1 I cache: 32K, L1 D cache: 32K Apr 29 23:31:16 midgard kernel: [18014366.509158] CPU: L2 cache: 2048K Apr 29 23:31:16 midgard kernel: [18014366.509160] CPU: Physical Processor ID: 0 Apr 29 23:31:16 midgard kernel: [18014366.509161] CPU: Processor Core ID: 1 Apr 29 23:31:16 midgard kernel: [18014366.509637] CPU1: Intel Genuine Intel(R) CPU1500 @ 2.00GHz stepping 08 Apr 29 23:31:16 midgard kernel: [18014366.509659] checking TSC
Re: 2.6.21-rc7-mm2 crash: Eeek! page_mapcount(page) went negative! (-1)
Am 30.04.2007 21:46 schrieb Andrew Morton: 2.6.21-final is fine. Sure, but what about 2.6.21-git3 (or, better, current -git)? OIC. Sorry for being dense. Will check. If that's OK then we need to pick through the difference between 2.6.21-rc7-mm2's driver tree and the patches which went into mainline. And that's a pretty small set. I'm not quite sure how to determine that difference. Can you just provide me with a list of patches you'd like me to test? Not really - everything's tangled up. A bisection search on the 2.6.21-rc7-mm2 driver tree would be the best bet. Ok. No prob. It'll just take a bit of time. (Compiling a kernel on that machine takes about 4 hours.) I'll be back. :-) -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany - Undetected errors are handled as if no error occurred. (IBM) - signature.asc Description: OpenPGP digital signature