Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Hello, On Fri, Dec 29, 2006 at 04:01:32PM +0100, Ard -kwaak- van Breemen wrote: > On Fri, Dec 29, 2006 at 03:10:58PM +0100, Ard -kwaak- van Breemen wrote: > > Preliminary patches: > > - pci fix of Andrews patches > The printk might be too verbose. I think removing them is ok I stick with the verbose printk. Because else we will never know that something is faul. > since the only thing that has happened is that it prevents > entering the loop and the semaphores. The only thing that bugs me > is if list_empty can be used like that. (in other words: don't we > need semaphores around that). I was wondering about the validity of pci_devices at that time. But on the other hand: if that was not wrong, people would have complained much earlier. Anyway, I think that's it: those 3 patches will fix and guard the problems we've seen. -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 29, 2006 at 04:01:32PM +0100, Ard -kwaak- van Breemen wrote: > > - parse-one detection of Yanmin > It doesn't flag it. I am working on that. As said: it was doing a callback to obsolete_... This replaces the patch into not being bloated and still gives enough info. It won't check voor callbacks or whatever, just which parameter b0rked it. Output of dmesg without the pci-patch applied: [EMAIL PROTECTED]:~$ dmesg|grep -B5 -A1 'interrupts were enabled' Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic=30 earlyprintk=serial,ttyS0,115200 ide_setup: hdb=noprobe parse_args(): option 'hdb=noprobe' enabled irq's! ide_setup: hdc=noprobe ide_setup: hdd=noprobe start_kernel(): bug: interrupts were enabled *very* early, fixing it Initializing CPU#0 -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . --- linux-2.6.19.vanilla/kernel/params.c2006-11-29 21:57:37.0 + +++ linux-2.6.19/kernel/params.c2006-12-29 15:14:26.0 + @@ -143,9 +143,14 @@ while (*args) { int ret; + int irq_was_disabled; args = next_arg(args, ¶m, &val); + irq_was_disabled=irqs_disabled(); ret = parse_one(param, val, params, num, unknown); + if(irq_was_disabled && !irqs_disabled()) { + printk(KERN_WARNING "parse_args(): option '%s' enabled irq's!\n",param); + } switch (ret) { case -ENOENT: printk(KERN_ERR "%s: Unknown parameter `%s'\n",
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 22, 2006 at 03:35:20PM +0100, Ard -kwaak- van Breemen wrote: > On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote: > > I expect that you'll find that the ide code ends up doing > > down_write(pci_bus_sem), which will enable interrupts. > will: down_read(&pci_bus_sem); > also enable interrupts? > Since that is called: > init/main.c start_kernel > kernel/params.c parse_args("Booting kernel" > kernel/params.c parse_one --- init/main.cunknown_bootoption init/main.c obsolete_checksetup --- > drivers/ide/ide.c ide_setup > drivers/ide/ide.c init_ide_data > drivers/ide/ide.cinit_hwif_default > include/asm-i386/ide.hide_default_io_base(index) > drivers/pci/search.c pci_find_device > drivers/pci/search.cpci_find_subsys Fixes in the calltree -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 29, 2006 at 04:01:32PM +0100, Ard -kwaak- van Breemen wrote: > > - parse-one detection of Yanmin > It doesn't flag it. I am working on that. Since it goes to a callback to obsolete_checksetup() Argh... my calltree was a little flawed :-(... -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 29, 2006 at 03:10:58PM +0100, Ard -kwaak- van Breemen wrote: > Preliminary patches: > - pci fix of Andrews patches The printk might be too verbose. I think removing them is ok since the only thing that has happened is that it prevents entering the loop and the semaphores. The only thing that bugs me is if list_empty can be used like that. (in other words: don't we need semaphores around that). > - parse-one detection of Yanmin It doesn't flag it. I am working on that. > - start_kernel detection and workaround (disable them again) main-irq-enable-detection-and-disable-again.patch is working great. I love to see that one included in the kernel. -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 29, 2006 at 02:27:59PM +0100, Ard -kwaak- van Breemen wrote: > I will clean up the patches found on this list to fix and detect this. Preliminary patches: - pci fix of Andrews patches - parse-one detection of Yanmin - start_kernel detection and workaround (disable them again) These are the patches that I am about to test in the next 2 hours... :-) Anyway: I think it is possible that other drivers are also potential irq enablers as soon as they are called from parse_one. Usually I compile network drivers as modules, but in diskless setups this might not be the case :-). -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . --- linux-2.6.19.vanilla/drivers/pci/search.c 2006-11-29 21:57:37.0 + +++ linux-2.6.19/drivers/pci/search.c 2006-12-29 13:58:51.0 + @@ -193,6 +193,17 @@ struct pci_dev *dev; WARN_ON(in_interrupt()); + + /* +* pci_find_subsys() can be called on the ide_setup() path, super-early +* in boot. But the down_read() will enable local interrupts, which +* can cause some machines to crash. So here we detect and flag that +* situation and bail out early. +*/ + if(unlikely(list_empty(&pci_devices))) { + printk(KERN_INFO "pci_find_subsys() called while pci_devices is still empty\n"); + return NULL; + } down_read(&pci_bus_sem); n = from ? from->global_list.next : pci_devices.next; @@ -259,6 +270,16 @@ struct pci_dev *dev; WARN_ON(in_interrupt()); + /* +* pci_get_subsys() can potentially be called by drivers super-early +* in boot. But the down_read() will enable local interrupts, which +* can cause some machines to crash. So here we detect and flag that +* situation and bail out early. +*/ + if(unlikely(list_empty(&pci_devices))) { + printk(KERN_NOTICE "pci_get_subsys() called while pci_devices is still empty\n"); + return NULL; + } down_read(&pci_bus_sem); n = from ? from->global_list.next : pci_devices.next; --- linux-2.6.19.vanilla/kernel/params.c2006-11-29 21:57:37.0 + +++ linux-2.6.19/kernel/params.c2006-12-29 14:02:48.0 + @@ -53,13 +53,20 @@ int (*handle_unknown)(char *param, char *val)) { unsigned int i; + int result; + int irq_was_disabled; /* Find parameter */ for (i = 0; i < num_params; i++) { if (parameq(param, params[i].name)) { DEBUGP("They are equal! Calling %p\n", params[i].set); - return params[i].set(val, ¶ms[i]); + irq_was_disabled = irqs_disabled(); + result=params[i].set(val, ¶ms[i]); + if (irq_was_disabled && !irqs_disabled()) { + printk(KERN_WARNING "[BUG] parse_one: kerneloption '%s' enabled irq!\n",param); + } + return result; } } --- linux-2.6.19.vanilla/init/main.c2006-11-29 21:57:37.0 + +++ linux-2.6.19/init/main.c2006-12-29 13:58:37.0 + @@ -525,6 +525,10 @@ parse_args("Booting kernel", command_line, __start___param, __stop___param - __start___param, &unknown_bootoption); + if (!irqs_disabled()) { + printk(KERN_WARNING "start_kernel(): bug: interrupts were enabled *very* early, fixing it\n"); + local_irq_disable(); + } sort_main_extable(); trap_init(); rcu_init();
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 29, 2006 at 01:51:08PM +0100, Ard -kwaak- van Breemen wrote: > I will try it on the right function, and see what we get. In function: 186 static struct pci_dev * pci_find_subsys(unsigned int vendor, 203if (unlikely(list_empty(&pci_devices))) { 204 printk("Pci device list empty, preventing down_read\n"); 205return NULL; 206 } delivers: [EMAIL PROTECTED]:~$ sudo grep -C1 'Pci device list empty' /var/log/kern.log Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] -- Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] I don't see any other warnings, so I guess the patch is working now :-). I will clean up the patches found on this list to fix and detect this. program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Hello Andrew, On Thu, Dec 28, 2006 at 03:51:48PM -0800, Andrew Morton wrote: > Could someone please test this? Without testing I declare it won't fix it 8-D > Ard has worked out the call tree: > > init/main.c start_kernel > kernel/params.c parse_args("Booting kernel" > kernel/params.c parse_one > drivers/ide/ide.c ide_setup > drivers/ide/ide.c init_ide_data > drivers/ide/ide.cinit_hwif_default > include/asm-i386/ide.hide_default_io_base(index) > drivers/pci/search.c pci_find_device > drivers/pci/search.cpci_find_subsys --^^ Your patch patches pci_get_subsys, while pci_find_subsys does the down_read... I will try it on the right function, and see what we get. Regards, Ard -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Il giorno gio, 28/12/2006 alle 15.51 -0800, Andrew Morton ha scritto: > Could someone please test this? > diff -puN drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot > drivers/pci/search.c > --- a/drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot > +++ a/drivers/pci/search.c > @@ -259,6 +259,16 @@ pci_get_subsys(unsigned int vendor, unsi > struct pci_dev *dev; > > WARN_ON(in_interrupt()); > + > + /* > + * pci_get_subsys() can be called on the ide_setup() path, super-early > + * in boot. But the down_read() will enable local interrupts, which > + * can cause some machines to crash. So here we detect that situation > + * and bail out early. > + */ > + if (unlikely(list_empty(pci_devices))) > + return NULL; > + > down_read(&pci_bus_sem); > n = from ? from->global_list.next : pci_devices.next; > > _ > Applied to 2.6.19 it returns error while compiling: CC drivers/pci/search.o drivers/pci/search.c: In function ‘pci_get_subsys’: drivers/pci/search.c:269: error: incompatible type for argument 1 of ‘list_empty’ make[2]: *** [drivers/pci/search.o] Error 1 make[1]: *** [drivers/pci] Error 2 make: *** [drivers] Error 2 drivers/pci/search.c 268 */ 269if (unlikely(list_empty(pci_devices))) 270 return NULL; -- Stefano Takekawa [EMAIL PROTECTED] Frank: And why do days get longer in the summer? Ernest: Because heat makes things expand! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Could someone please test this? From: Andrew Morton <[EMAIL PROTECTED]> Various people have reported machines failing to boot since pci_bus_sem was switched from a spinlock to an rwsem. The reason for this is that these people had "ide=" on the kernel commandline, and ide_setup() can end up calling PCI functions which do down_read(&pci_bus_sem). Ard has worked out the call tree: init/main.c start_kernel kernel/params.c parse_args("Booting kernel" kernel/params.c parse_one drivers/ide/ide.c ide_setup drivers/ide/ide.c init_ide_data drivers/ide/ide.cinit_hwif_default include/asm-i386/ide.hide_default_io_base(index) drivers/pci/search.c pci_find_device drivers/pci/search.cpci_find_subsys down_read(&pci_bus_sem); down_read() will unconditionally enable interrupts and some early interrupt (source unknown) comes in and whacks the machine, apparently because the LDT isn't set up yet. Fix that by avoiding taking the semaphore in the PCI code in this situation. Cc: Ard -kwaak- van Breemen <[EMAIL PROTECTED]> Cc: "Zhang, Yanmin" <[EMAIL PROTECTED]> Cc: Chuck Ebbert <[EMAIL PROTECTED]> Cc: Yinghai Lu <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]> Cc: Greg KH <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/pci/search.c | 10 ++ 1 files changed, 10 insertions(+) diff -puN drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot drivers/pci/search.c --- a/drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot +++ a/drivers/pci/search.c @@ -259,6 +259,16 @@ pci_get_subsys(unsigned int vendor, unsi struct pci_dev *dev; WARN_ON(in_interrupt()); + + /* +* pci_get_subsys() can be called on the ide_setup() path, super-early +* in boot. But the down_read() will enable local interrupts, which +* can cause some machines to crash. So here we detect that situation +* and bail out early. +*/ + if (unlikely(list_empty(pci_devices))) + return NULL; + down_read(&pci_bus_sem); n = from ? from->global_list.next : pci_devices.next; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
> I am pretty sure the i386 tree has the same problem but I haven't checked yet. > Anyway: the panic is just a way of noticing. The bug is that irq's are enabled > before the irq controller is set up. A very similar i386 linux installation works fine on my laptop, but that i386 kernel never had problem. -- Stefano Takekawa [EMAIL PROTECTED] Frank: And why do days get longer in the summer? Ernest: Because heat makes things expand! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, 22 Dec 2006 15:16:55 +0100 Ard -kwaak- van Breemen <[EMAIL PROTECTED]> wrote: > On Fri, Dec 22, 2006 at 03:00:59PM +0100, Ard -kwaak- van Breemen wrote: > > 262 if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: > > interrupts were enabled [EMAIL PROTECTED]",__FUNCTION__,__LINE__); > > 263 > > 264 ide_init_hwif_ports(&hw, ide_default_io_base(index), 0, > > &hwif->irq); > --^^^ > which does a if (pci_find_device(PCI_ANY_ID, PCI_ANY_ID, > in include/asm-i386/ide.h > > which should be really the part that does the irq enabling. doh, I missed down_read(): --- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs +++ a/lib/rwsem-spinlock.c @@ -129,13 +129,14 @@ void fastcall __sched __down_read(struct { struct rwsem_waiter waiter; struct task_struct *tsk; + unsigned long flags; - spin_lock_irq(&sem->wait_lock); + spin_lock_irqsave(&sem->wait_lock, flags); if (sem->activity >= 0 && list_empty(&sem->wait_list)) { /* granted */ sem->activity++; - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -150,7 +151,7 @@ void fastcall __sched __down_read(struct list_add_tail(&waiter.list, &sem->wait_list); /* we don't need to touch the semaphore struct anymore */ - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); /* wait to be given the lock */ for (;;) { @@ -195,13 +196,14 @@ void fastcall __sched __down_write_neste { struct rwsem_waiter waiter; struct task_struct *tsk; + unsigned long flags; - spin_lock_irq(&sem->wait_lock); + spin_lock_irqsave(&sem->wait_lock, flags); if (sem->activity == 0 && list_empty(&sem->wait_list)) { /* granted */ sem->activity = -1; - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -216,7 +218,7 @@ void fastcall __sched __down_write_neste list_add_tail(&waiter.list, &sem->wait_list); /* we don't need to touch the semaphore struct anymore */ - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); /* wait to be given the lock */ for (;;) { _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Hello, On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote: > I couldn't reproduce it on my EM64T machine. I instrumented function > start_kernel and > didn't find irq was enabled before calling init_IRQ. It'll be better if the > reporter could > instrument function start_kernel to capture which function enables irq. I can confirm this is a *GENERIC* X86_64 problem: Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe root=/dev/md0 init/main.c start_kernel(): interrupts were [EMAIL PROTECTED] ide_setup: hdb=noprobe init/main.c start_kernel(): interrupts were [EMAIL PROTECTED] ... start_kernel(): bug: interrupts were enabled early This is on a dell 1950 with a core 2 duo processors. You have to have ide compiled in, and set ide options to get the irq's enabled, and then have a setup which will have an irq pending before the irq controller get's initialized to get the panic. The dell1950 does not panic, the kernel merely warns. I am pretty sure the i386 tree has the same problem but I haven't checked yet. Anyway: the panic is just a way of noticing. The bug is that irq's are enabled before the irq controller is set up. But to make the ide_setup/irq bug go away, I think it might be an acceptable solution to just disable the irq's again after the parse_args, and just to wait until the SATA tree takes over the IDE tree. -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 22, 2006 at 03:41:34PM +0100, Ard -kwaak- van Breemen wrote: > Repeating: I am very stupid, so I don't know if saving the irq state is ok or > not in down_read. The Andrew Morton patch but the rewritten for down_read makes the symptoms go away. The problem obviously is that the ide_setup pokes the pci subsystem way too early. Parsing of the ide parameters should be delayed until the next run of parse_args I guess. --- linux-2.6.19.1/lib/rwsem-spinlock.c 2006-12-11 19:32:53.0 + +++ linux-2.6.19/lib/rwsem-spinlock.c 2006-12-22 15:06:52.0 + @@ -129,13 +129,14 @@ { struct rwsem_waiter waiter; struct task_struct *tsk; + unsigned long flags; - spin_lock_irq(&sem->wait_lock); + spin_lock_irqsave(&sem->wait_lock, flags); if (sem->activity >= 0 && list_empty(&sem->wait_list)) { /* granted */ sem->activity++; - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -150,7 +151,7 @@ list_add_tail(&waiter.list, &sem->wait_list); /* we don't need to touch the semaphore struct anymore */ - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); /* wait to be given the lock */ for (;;) {
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote: > To whom do I have to pay how much to get this darn patch tested? I've altered your patch to do the spin_lock_irqsave in down_read. I am very ignorant and stupid. That's why I am doing it without thinking why or why not de irqsave is ok in that region or not. And the results are: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were [EMAIL PROTECTED] Meaning: it works. Repeating: I am very stupid, so I don't know if saving the irq state is ok or not in down_read. -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote: > I expect that you'll find that the ide code ends up doing > down_write(pci_bus_sem), which will enable interrupts. will: down_read(&pci_bus_sem); also enable interrupts? Since that is called: init/main.c start_kernel kernel/params.c parse_args("Booting kernel" kernel/params.c parse_one drivers/ide/ide.c ide_setup drivers/ide/ide.c init_ide_data drivers/ide/ide.cinit_hwif_default include/asm-i386/ide.hide_default_io_base(index) drivers/pci/search.c pci_find_device drivers/pci/search.cpci_find_subsys down_read(&pci_bus_sem); -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 22, 2006 at 03:00:59PM +0100, Ard -kwaak- van Breemen wrote: > 262 if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: > interrupts were enabled [EMAIL PROTECTED]",__FUNCTION__,__LINE__); > 263 > 264 ide_init_hwif_ports(&hw, ide_default_io_base(index), 0, > &hwif->irq); --^^^ which does a if (pci_find_device(PCI_ANY_ID, PCI_ANY_ID, in include/asm-i386/ide.h which should be really the part that does the irq enabling. -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, Dec 22, 2006 at 11:30:05AM +0100, Ard -kwaak- van Breemen wrote: > Anyway: on to the ide_setup tracking > (I've noticed that the notifier of this problem als has idebus=66 > or something similar, so that explains in his case the > early call to ide_setup.) Aaarrgh... Somewhere between the call to ide_setup and ide_init_hwif_ports the interrupts get enabled. But I haven't got to the point where exactly... include/linux/ide.h: 266 /* 267 * ide_init_hwif_ports() is OBSOLETE and will be removed in 2.7 series. 268 * New ports shouldn't define IDE_ARCH_OBSOLETE_INIT in . 269 */ 270 #ifdef IDE_ARCH_OBSOLETE_INIT 271 static inline void ide_init_hwif_ports(hw_regs_t *hw, 272unsigned long io_addr, 273unsigned long ctl_addr, 274int *irq) 275 { 276 if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled [EMAIL PROTECTED]",__FUNCTION__,__LINE__); 277 if (!ctl_addr) { 278 ide_std_init_ports(hw, io_addr, ide_default_io_ctl(io_addr)); drivers/ide/ide.c: 256 static void init_hwif_default(ide_hwif_t *hwif, unsigned int index) 257 { 258 hw_regs_t hw; 259 260 if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled [EMAIL PROTECTED]",__FUNCTION__,__LINE__); 261 memset(&hw, 0, sizeof(hw_regs_t)); 262 if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled [EMAIL PROTECTED]",__FUNCTION__,__LINE__); 263 264 ide_init_hwif_ports(&hw, ide_default_io_base(index), 0, &hwif->irq); 265 if (!irqs_disabled()) printk(__FILE__ " %s(): blaat: interrupts were enabled [EMAIL PROTECTED]",__FUNCTION__,__LINE__); 266 dmesg: BLAAT20Parsing ARGS: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic =30 earlyprintk=serial,ttyS0,115200 Unknown argument: calling 80643380 Unknown argument: calling 80643380 Unknown argument: calling 80643380 ide_setup: hdb=noprobeinclude/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled [EMAIL PROTECTED] include/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled [EMAIL PROTECTED] include/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled [EMAIL PROTECTED] include/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled [EMAIL PROTECTED] drivers/ide/ide.c init_hwif_default(): blaat: interrupts were enabled [EMAIL PROTECTED] drivers/ide/ide.cinit_hwif_default(): blaat: interrupts were enabled [EMAIL PROTECTED] drivers/ide/ide.cinit_ide_data(): blaat: interrupts were enabled [EMAIL PROTECTED] So as I read it: init_hwif_default calls ide_init_hwif_ports with irq's disabled, but upon entrance, the irq's are enabled. That really makes no sense to me. So I will continue digging this code (there must be something recursive going on). -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Il giorno ven, 22/12/2006 alle 01.43 -0800, Andrew Morton ha scritto: > On Fri, 22 Dec 2006 10:32:51 +0100 > Stefano Takekawa <[EMAIL PROTECTED]> wrote: > > > Applied to 2.6.19 it doesn't change anything. It still panics. > > Really? > > And you can confirm that converting pci_bus_sem back into a spinlock fixes > it? > > > How can I have something similar to a serial console on a laptop without > > serial port but with a parallel one? Will netconsole work? > > > > No, netconsole isn't available for quite some time after the kernel starts. > > Your best bet would be to boot with `earlyprintk=vga vga=N', where N is > something which gives lots of rows. 0F01, perhaps. > > Then, take a digital photo of the display. I can't take any digital photo. Well I got this: 2.6.19 + lib/rwsem-spinlock.c patched + hdc=ide-cd or idebus=66 >> panic 2.6.19 + lib/rwsem-spinlock.c patched + no ide_setup calls >> works!!! 2.6.19 + spinlock reversed >> always works -- Stefano Takekawa [EMAIL PROTECTED] Frank: And why do days get longer in the summer? Ernest: Because heat makes things expand! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Hello, On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote: > To whom do I have to pay how much to get this darn patch tested? I've already tested that (as I said somewhere in the bugzilla so it probably got lost somehow :-) ): It doesn't solve the booting problem, and I really don't have an idea what it does, nor does it output any debug code. So I left it at: doesn't fix ;-). Anyway: on to the ide_setup tracking (I've noticed that the notifier of this problem als has idebus=66 or something similar, so that explains in his case the early call to ide_setup.) -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, 22 Dec 2006 10:32:51 +0100 Stefano Takekawa <[EMAIL PROTECTED]> wrote: > Applied to 2.6.19 it doesn't change anything. It still panics. Really? And you can confirm that converting pci_bus_sem back into a spinlock fixes it? > How can I have something similar to a serial console on a laptop without > serial port but with a parallel one? Will netconsole work? > No, netconsole isn't available for quite some time after the kernel starts. Your best bet would be to boot with `earlyprintk=vga vga=N', where N is something which gives lots of rows. 0F01, perhaps. Then, take a digital photo of the display. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Il giorno ven, 22/12/2006 alle 00.30 -0800, Andrew Morton ha scritto: > On Fri, 22 Dec 2006 09:22:48 +0100 > Ard -kwaak- van Breemen <[EMAIL PROTECTED]> wrote: > > > Hello, > > On Fri, Dec 22, 2006 at 12:41:46PM +0800, Zhang, Yanmin wrote: > > > I think parse_args enables irq when it calls callbacks. > > > Could you try below? > > > 1) Test Andrew's patch of sema down_write; > > > 2) Apply below patch and see what the output is when booting. If the > > > output has > > > "[BUG]..address.", Pls. map the address to function name by System.map. > > Without proof^H^H^H^H^Hpasting my dmesg and the "diff", I already > > concluded that ide_setup was the culprit. (I've debuged > > parse_one, and it barfed around the 3rd parameter which is > > hdb=noprobe). > > Anyway, a bad night of sleep reminds me that our EM64T boxes also > > have this line (which actually is a remainder of our VA1220 boxes > > ;-) ), and they don't barf, so it must be either the combination > > of the sata_nv together with the pata driver part, *or* just the > > pata driver part. (Our opteron != nforce chipsets also works). > > > > I expect that you'll find that the ide code ends up doing > down_write(pci_bus_sem), which will enable interrupts. > > (We don't know which interrupt is pending this early - that'd be > interesting to find out, but we shouldn't be enabling interrupts in there). > > To whom do I have to pay how much to get this darn patch tested? > > > > --- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs > +++ a/lib/rwsem-spinlock.c > @@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste > { > struct rwsem_waiter waiter; > struct task_struct *tsk; > + unsigned long flags; > > - spin_lock_irq(&sem->wait_lock); > + spin_lock_irqsave(&sem->wait_lock, flags); > > if (sem->activity == 0 && list_empty(&sem->wait_list)) { > /* granted */ > sem->activity = -1; > - spin_unlock_irq(&sem->wait_lock); > + spin_unlock_irqrestore(&sem->wait_lock, flags); > goto out; > } > > @@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste > list_add_tail(&waiter.list, &sem->wait_list); > > /* we don't need to touch the semaphore struct anymore */ > - spin_unlock_irq(&sem->wait_lock); > + spin_unlock_irqrestore(&sem->wait_lock, flags); > > /* wait to be given the lock */ > for (;;) { > _ > Applied to 2.6.19 it doesn't change anything. It still panics. How can I have something similar to a serial console on a laptop without serial port but with a parallel one? Will netconsole work? -- Stefano Takekawa [EMAIL PROTECTED] Frank: And why do days get longer in the summer? Ernest: Because heat makes things expand! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Fri, 22 Dec 2006 09:22:48 +0100 Ard -kwaak- van Breemen <[EMAIL PROTECTED]> wrote: > Hello, > On Fri, Dec 22, 2006 at 12:41:46PM +0800, Zhang, Yanmin wrote: > > I think parse_args enables irq when it calls callbacks. > > Could you try below? > > 1) Test Andrew's patch of sema down_write; > > 2) Apply below patch and see what the output is when booting. If the output > > has > > "[BUG]..address.", Pls. map the address to function name by System.map. > Without proof^H^H^H^H^Hpasting my dmesg and the "diff", I already > concluded that ide_setup was the culprit. (I've debuged > parse_one, and it barfed around the 3rd parameter which is > hdb=noprobe). > Anyway, a bad night of sleep reminds me that our EM64T boxes also > have this line (which actually is a remainder of our VA1220 boxes > ;-) ), and they don't barf, so it must be either the combination > of the sata_nv together with the pata driver part, *or* just the > pata driver part. (Our opteron != nforce chipsets also works). > I expect that you'll find that the ide code ends up doing down_write(pci_bus_sem), which will enable interrupts. (We don't know which interrupt is pending this early - that'd be interesting to find out, but we shouldn't be enabling interrupts in there). To whom do I have to pay how much to get this darn patch tested? --- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs +++ a/lib/rwsem-spinlock.c @@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste { struct rwsem_waiter waiter; struct task_struct *tsk; + unsigned long flags; - spin_lock_irq(&sem->wait_lock); + spin_lock_irqsave(&sem->wait_lock, flags); if (sem->activity == 0 && list_empty(&sem->wait_list)) { /* granted */ sem->activity = -1; - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste list_add_tail(&waiter.list, &sem->wait_list); /* we don't need to touch the semaphore struct anymore */ - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); /* wait to be given the lock */ for (;;) { _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Hello, On Fri, Dec 22, 2006 at 12:41:46PM +0800, Zhang, Yanmin wrote: > I think parse_args enables irq when it calls callbacks. > Could you try below? > 1) Test Andrew's patch of sema down_write; > 2) Apply below patch and see what the output is when booting. If the output > has > "[BUG]..address.", Pls. map the address to function name by System.map. Without proof^H^H^H^H^Hpasting my dmesg and the "diff", I already concluded that ide_setup was the culprit. (I've debuged parse_one, and it barfed around the 3rd parameter which is hdb=noprobe). Anyway, a bad night of sleep reminds me that our EM64T boxes also have this line (which actually is a remainder of our VA1220 boxes ;-) ), and they don't barf, so it must be either the combination of the sata_nv together with the pata driver part, *or* just the pata driver part. (Our opteron != nforce chipsets also works). I will trace down the ide_setup today. First loads of coffee. -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
>>-Original Message- >>From: Ard -kwaak- van Breemen [mailto:[EMAIL PROTECTED] >>Sent: 2006年12月22日 5:06 >>To: Zhang, Yanmin >>Cc: Andrew Morton; Chuck Ebbert; Yinghai Lu; [EMAIL PROTECTED]; [EMAIL >>PROTECTED]; linux-kernel@vger.kernel.org; >>[EMAIL PROTECTED]; Eric W. Biederman >>Subject: Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine >> >>On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote: >>> I couldn't reproduce it on my EM64T machine. I instrumented function >>> start_kernel and >>> didn't find irq was enabled before calling init_IRQ. It'll be better if the >>> reporter could >>> instrument function start_kernel to capture which function enables irq. >> >>Editing init/main.c: >>preempt_disable(); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>printk("BLAAT17"); >>build_all_zonelists(); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>printk("BLAAT18"); >>page_alloc_init(); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>printk("BLAAT19"); >>printk(KERN_NOTICE "Kernel command line: %s\n", saved_command_line); >>parse_early_param(); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>printk("BLAAT20"); >>parse_args("Booting kernel", command_line, __start___param, >> __stop___param - __start___param, >> &unknown_bootoption); >>printk("BLAAT21"); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>sort_main_extable(); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>printk("BLAAT22"); >>trap_init(); >>if (!irqs_disabled()) >>printk("start_kernel(): bug: interrupts were enabled >> early\n"); >>printk("BLAAT23"); >> >>Results in: >>^MAllocating PCI resources starting at 8800 (gap: 8000:6000) >>^MBLAAT12BLAAT13<6>PERCPU: Allocating 32960 bytes of per cpu data >>^MBLAAT14BLAAT15BLAAT16BLAAT17Built 2 zonelists. Total pages: 1032635 >>^MBLAAT18BLAAT19<5>Kernel command line: console=tty0 console=ttyS0,115200 >>hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic=30 >>earlyprintk=serial,ttyS0,115200 >>^MBLAAT20<6>ide_setup: hdb=noprobe >>^Mide_setup: hdc=noprobe >>^Mide_setup: hdd=noprobe >>^MBLAAT21start_kernel(): bug: interrupts were enabled early >>^Mstart_kernel(): bug: interrupts were enabled early >>^MBLAAT22Initializing CPU#0 >> >>Hmmm, that actually doesn't make sense to me (unless parse_args is able to >>enable irq's). I think parse_args enables irq when it calls callbacks. Could you try below? 1) Test Andrew's patch of sema down_write; 2) Apply below patch and see what the output is when booting. If the output has "[BUG]..address.", Pls. map the address to function name by System.map. --- linux-2.6.19/kernel/params.c2006-12-08 15:32:49.0 +0800 +++ linux-2.6.19_work/kernel/params.c 2006-12-22 12:28:38.0 +0800 @@ -53,13 +53,22 @@ static int parse_one(char *param, int (*handle_unknown)(char *param, char *val)) { unsigned int i; + int result; + int irq_is_disabled; /* Find parameter */ for (i = 0; i < num_params; i++) { if (parameq(param, params[i].name)) { DEBUGP("They are equal! Calling %p\n", params[i].set); - return params[i].set(val, ¶ms[i]); + irq_is_disabled = irqs_disabled(); + result = params[i].set(val, ¶ms[i]); + if (irq_is_disabled && !irqs_disabled()) + { + printk("[BUG] parse_one: function %p enabled irq!\n", + params[i].set); + } +
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote: > I couldn't reproduce it on my EM64T machine. I instrumented function > start_kernel and > didn't find irq was enabled before calling init_IRQ. It'll be better if the > reporter could > instrument function start_kernel to capture which function enables irq. Editing init/main.c: preempt_disable(); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); printk("BLAAT17"); build_all_zonelists(); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); printk("BLAAT18"); page_alloc_init(); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); printk("BLAAT19"); printk(KERN_NOTICE "Kernel command line: %s\n", saved_command_line); parse_early_param(); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); printk("BLAAT20"); parse_args("Booting kernel", command_line, __start___param, __stop___param - __start___param, &unknown_bootoption); printk("BLAAT21"); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); sort_main_extable(); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); printk("BLAAT22"); trap_init(); if (!irqs_disabled()) printk("start_kernel(): bug: interrupts were enabled early\n"); printk("BLAAT23"); Results in: ^MAllocating PCI resources starting at 8800 (gap: 8000:6000) ^MBLAAT12BLAAT13<6>PERCPU: Allocating 32960 bytes of per cpu data ^MBLAAT14BLAAT15BLAAT16BLAAT17Built 2 zonelists. Total pages: 1032635 ^MBLAAT18BLAAT19<5>Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic=30 earlyprintk=serial,ttyS0,115200 ^MBLAAT20<6>ide_setup: hdb=noprobe ^Mide_setup: hdc=noprobe ^Mide_setup: hdd=noprobe ^MBLAAT21start_kernel(): bug: interrupts were enabled early ^Mstart_kernel(): bug: interrupts were enabled early ^MBLAAT22Initializing CPU#0 Hmmm, that actually doesn't make sense to me (unless parse_args is able to enable irq's). -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
Hello, On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote: > I couldn't reproduce it on my EM64T machine. I instrumented function > start_kernel and > didn't find irq was enabled before calling init_IRQ. It'll be better if the > reporter could > instrument function start_kernel to capture which function enables irq. Just diving into the sources. Is that something like: if(!raw_irqs_disabled_flags) printk "irqs are enabled"; (At that moment it might have crashed already.. :-)). I don't see the complete context yet, but I hope the irq is triggered after the irq is somehow enabled. BTW: the panic occurs on half of my boards on tyan S2891 with 2 opterons, of which the only difference seems to be the purchase date (and hence probably the motherboard revisions). (Haven't got time yet to pull them out of the rack and compare the motherboards). -- program signature; begin { telegraaf.com } writeln("<[EMAIL PROTECTED]> TEM2"); end . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Thu, 21 Dec 2006 20:52:40 +0100 Ard -kwaak- van Breemen <[EMAIL PROTECTED]> wrote: > Hello, > > On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote: > > I couldn't reproduce it on my EM64T machine. I instrumented function > > start_kernel and > > didn't find irq was enabled before calling init_IRQ. It'll be better if the > > reporter could > > instrument function start_kernel to capture which function enables irq. > Just diving into the sources. > Is that something like: > if(!raw_irqs_disabled_flags) printk "irqs are enabled"; > > (At that moment it might have crashed already.. :-)). > > I don't see the complete context yet, but I hope the irq is > triggered after the irq is somehow enabled. > > BTW: the panic occurs on half of my boards on tyan S2891 with 2 > opterons, of which the only difference seems to be the purchase > date (and hence probably the motherboard revisions). (Haven't got > time yet to pull them out of the rack and compare the > motherboards). please, I'm still waiting for someone to tell me whether this "fixes" it: --- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs +++ a/lib/rwsem-spinlock.c @@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste { struct rwsem_waiter waiter; struct task_struct *tsk; + unsigned long flags; - spin_lock_irq(&sem->wait_lock); + spin_lock_irqsave(&sem->wait_lock, flags); if (sem->activity == 0 && list_empty(&sem->wait_list)) { /* granted */ sem->activity = -1; - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste list_add_tail(&waiter.list, &sem->wait_list); /* we don't need to touch the semaphore struct anymore */ - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); /* wait to be given the lock */ for (;;) { _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
>>-Original Message- >>From: Andrew Morton [mailto:[EMAIL PROTECTED] >>Sent: 2006年12月20日 18:38 >>To: Chuck Ebbert >>Cc: Yinghai Lu; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; >>linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; >>Eric W. Biederman; Zhang, Yanmin >>Subject: Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine >> >>On Wed, 20 Dec 2006 04:59:19 -0500 >>Chuck Ebbert <[EMAIL PROTECTED]> wrote: >> >>> > On 12/19/06, Chuck Ebbert <[EMAIL PROTECTED]> wrote: >>> > > So an external interrupt occurred, the system tried to use interrupt >>> > > descriptor #39 decimal (irq 7), but the descriptor was invalid. >>> > >>> > but the irq is disabled at that time. >>> > >>> > can you use attached diff to verify if the irq is enable somehow? >>> >>> But it seems interrupts are on--look at the flags: >>> >>> RSP: 0018:803cdf68 EFLAGS: 00010246 >>> >> >>down_write()->__down_write()->__down_write_nested()->spin_unlock_irq()->dead >> >>Could someone please test this? I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could instrument function start_kernel to capture which function enables irq. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Wed, 2006-12-20 at 02:37 -0800, Andrew Morton wrote: > On Wed, 20 Dec 2006 04:59:19 -0500 > Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > > On 12/19/06, Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > > So an external interrupt occurred, the system tried to use interrupt > > > > descriptor #39 decimal (irq 7), but the descriptor was invalid. > > > > > > but the irq is disabled at that time. > > > > > > can you use attached diff to verify if the irq is enable somehow? > > > > But it seems interrupts are on--look at the flags: > > > > RSP: 0018:803cdf68 EFLAGS: 00010246 > > > > down_write()->__down_write() -> __down_write_nested()->spin_unlock_irq()->dead since down_write() sleeps. what? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Wed, 20 Dec 2006 04:59:19 -0500 Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > On 12/19/06, Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > So an external interrupt occurred, the system tried to use interrupt > > > descriptor #39 decimal (irq 7), but the descriptor was invalid. > > > > but the irq is disabled at that time. > > > > can you use attached diff to verify if the irq is enable somehow? > > But it seems interrupts are on--look at the flags: > > RSP: 0018:803cdf68 EFLAGS: 00010246 > down_write()->__down_write()->__down_write_nested()->spin_unlock_irq()->dead Could someone please test this? --- a/lib/rwsem-spinlock.c~a +++ a/lib/rwsem-spinlock.c @@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste { struct rwsem_waiter waiter; struct task_struct *tsk; + unsigned long flags; - spin_lock_irq(&sem->wait_lock); + spin_lock_irqsave(&sem->wait_lock, flags); if (sem->activity == 0 && list_empty(&sem->wait_list)) { /* granted */ sem->activity = -1; - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste list_add_tail(&waiter.list, &sem->wait_list); /* we don't need to touch the semaphore struct anymore */ - spin_unlock_irq(&sem->wait_lock); + spin_unlock_irqrestore(&sem->wait_lock, flags); /* wait to be given the lock */ for (;;) { _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On 12/20/06, Chuck Ebbert <[EMAIL PROTECTED]> wrote: But it seems interrupts are on--look at the flags: RSP: 0018:803cdf68 EFLAGS: 00010246 Yes, the IF bit is set. maybe someone (reporters) could add !irq_disabled() and printk in start_kernel init/main.c to see which function cause the irq get enabled. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
> On 12/19/06, Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > So an external interrupt occurred, the system tried to use interrupt > > descriptor #39 decimal (irq 7), but the descriptor was invalid. > > but the irq is disabled at that time. > > can you use attached diff to verify if the irq is enable somehow? But it seems interrupts are on--look at the flags: RSP: 0018:803cdf68 EFLAGS: 00010246 -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On 12/19/06, Chuck Ebbert <[EMAIL PROTECTED]> wrote: So an external interrupt occurred, the system tried to use interrupt descriptor #39 decimal (irq 7), but the descriptor was invalid. but the irq is disabled at that time. can you use attached diff to verify if the irq is enable somehow? YH diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c index d73c79e..fedde34 100644 --- a/arch/x86_64/kernel/i8259.c +++ b/arch/x86_64/kernel/i8259.c @@ -421,7 +421,11 @@ void __init init_ISA_irqs (void) { int i; + if (!irqs_disabled()) + printk("init_ISA_irqs(): -1 bug: interrupts were enabled early\n"); init_bsp_APIC(); + if (!irqs_disabled()) + printk("init_ISA_irqs(): -2 bug: interrupts were enabled early\n"); init_8259A(0); for (i = 0; i < NR_IRQS; i++) {
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
In-Reply-To: <[EMAIL PROTECTED]> On Tue, 19 Dec 2006 17:29:00 -0800, Andrew Morton wrote: > Quoting the bug report: > general protection fault: 013b [1] PREEMPT That '013b' is critical information. Bit 0: 1: exception source is external to the processor Bit 1: 1: there is a problem with an interrupt descriptor in the IDT Bit 2: n/a Bits 15-3: index of the problem descriptor So an external interrupt occurred, the system tried to use interrupt descriptor #39 decimal (irq 7), but the descriptor was invalid. -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
On Mon, 18 Dec 2006 09:48:01 -0700 [EMAIL PROTECTED] (Eric W. Biederman) wrote: > [EMAIL PROTECTED] writes: > > > http://bugzilla.kernel.org/show_bug.cgi?id=7505 > > > > --- Additional Comments From [EMAIL PROTECTED] 2006-12-18 07:39 --- > > OK, fixed. > > > Greg. > > It appears commit d71374dafbba7ec3f67371d3b7e9f6310a588808 which > replaced the pci bus spinlock with a semaphore causes some systems not > to boot. I haven't a clue why. > > So I figure I would toss the ball over to your court to see if you can > look and see what needs to happen to resolve this problem. > > There appears to be at least one positive confirmation that reverting > this patch allows this patch fixes the problems. > That's weird. Quoting the bug report: There are output from kernel with enabled 'earlyprintk' option. Linux version 2.6.19-rc5 ([EMAIL PROTECTED]) (gcc version 4.1.2 20060901 (prerelease) (Debian 4.1.1-13)) #2 PREEMPT Sat Nov 11 16:04:00 MSK 2006 Command line: BOOT_IMAGE=Linux-bug ro root=303 video=radeonfb:mode:[EMAIL PROTECTED] idebus=66 earlyprintk=serial,ttyS0,9600,keep BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 1fff (usable) BIOS-e820: 1fff - 1fff3000 (ACPI NVS) BIOS-e820: 1fff3000 - 2000 (ACPI data) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - 0001 (reserved) end_pfn_map = 1048576 kernel direct mapping tables up to 1 @ 8000-d000 DMI 2.2 present. Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1048576 early_node_map[2] active PFN ranges 0:0 -> 159 0: 256 -> 131056 Nvidia board detected. Ignoring ACPI timer override. ACPI: PM-Timer IO Port: 0x4008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Nosave address range: 0009f000 - 000a Nosave address range: 000a - 000f Nosave address range: 000f - 0010 Allocating PCI resources starting at 3000 (gap: 2000:c000) Built 1 zonelists. Total pages: 128336 Kernel command line: BOOT_IMAGE=Linux-bug ro root=303 video=radeonfb:mode:[EMAIL PROTECTED] idebus=66 earlyprintk=serial,ttyS0,9600,keep ide_setup: idebus=66 Initializing CPU#0 general protection fault: 013b [1] PREEMPT CPU 0 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.19-rc5 #2 RIP: 0010:[] [] init_8259A+0xb6/0xf0 RSP: 0018:803cdf68 EFLAGS: 00010246 RAX: 00ff RBX: 0246 RCX: b4fcb55f RDX: 0011 RSI: 8013cf40 RDI: 0199 RBP: R08: R09: R10: 0001 R11: 0070 R12: R13: R14: R15: FS: () GS:803c() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00f0aed9 CR3: 00101000 CR4: 06a0 Process swapper (pid: 0, threadinfo 803cc000, task 80360360) Stack: 803d3a46 800089360a40206f 0009 0008e000 803d3ab9 803ddd99 0009 803cf65a 0009 Call Trace: [] init_ISA_irqs+0x16/0x80 [] init_IRQ+0x9/0x1e0 [] rcu_cpu_notify+0x49/0x60 [] start_kernel+0xda/0x1f0 [] _sinittext+0x146/0x150 I assume we went splat in start_kernel->trap_init->cpu_init. We shouldn't have touched pci_bus_lock that early? Perhaps acpi does PCI things very early.. Conceivably an accidental early local_irq_enable could cause bad things, but that rwsem should be 100% uncontended. Could the reporters please determine whether disabling the various CONFIG_DEBUG_* options prevents this? Such as CONFIG_DEBUG_LOCKDEP, CONFIG_DEBUG_LOCK_ALLOC, CONFIG_PROVE_LOCKING, etc? Also, some additional oops traces would be nice, if we can get them. (Please do reply-to-all via email from now on, rather than using the bugzilla UI). - To unsubscribe from this list: send the line "unsubscribe
Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
[EMAIL PROTECTED] writes: > http://bugzilla.kernel.org/show_bug.cgi?id=7505 > > --- Additional Comments From [EMAIL PROTECTED] 2006-12-18 07:39 --- > OK, fixed. Greg. It appears commit d71374dafbba7ec3f67371d3b7e9f6310a588808 which replaced the pci bus spinlock with a semaphore causes some systems not to boot. I haven't a clue why. So I figure I would toss the ball over to your court to see if you can look and see what needs to happen to resolve this problem. There appears to be at least one positive confirmation that reverting this patch allows this patch fixes the problems. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/