Re: dc0: watchdog timeout and nve0: device timeout
On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote: A After updating to STABLE today I'm getting the following message with A my dc and nve NICs every few seconds. UP, AMD64. A kernel from last A Thursday was fine. A A dc0: watchdog timeout A nve0: device timeout (4) Can you try to backout the code in sys/dev/pci to Thursday? If this doesn't help, you probably need to do a binary search in this small timeframe. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dc0: watchdog timeout and nve0: device timeout
On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote: On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote: A After updating to STABLE today I'm getting the following message with A my dc and nve NICs every few seconds. UP, AMD64. A kernel from last A Thursday was fine. A A dc0: watchdog timeout A nve0: device timeout (4) Can you try to backout the code in sys/dev/pci to Thursday? If this doesn't help, you probably need to do a binary search in this small timeframe. I think I found the problem - the merge was not quite correct, and the PCI interrupt rerouting was disabled for some reason. Warner, is there a reason for hiding the Try to re-route interrupts code behind an apparently ifdef 0 case? Well, okay, most probably there is a reason, since you've done it, but... it breaks my re0 card and it also seems to break Anish's hardware :) BTW, the commit message was not quite correct - rev. 1.302 was not really merged, it's included in my patch here. Also, rev. 1.305 of pci.c seems to have more than just adding the PCI_FIND_EXTCAP method - there are a couple of offset fixes that I also included in the patch while trying to come as close to the -CURRENT code as possible; could you check if they actually apply to -STABLE? Anyway, here's a patch that fixes it for me, although most probably the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if you want more details, I could help with debugging this - on my system, the re0 card definitely needs this rerouting. I've posted some verbose boot output with explanations at http://people.FreeBSD.org/~roam/pcirouting/ The patch itself is also there in case it gets munged by the mail swervers along the way. Index: src/sys/dev/pci/pci.c === RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v retrieving revision 1.292.2.6 diff -u -r1.292.2.6 pci.c --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 - 1.292.2.6 +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 - @@ -428,7 +428,7 @@ ptrptr = PCIR_CAP_PTR; break; case 2: - ptrptr = 0x14; + ptrptr = PCIR_CAP_PTR_2; break; default: return; /* no extended capabilities support */ @@ -447,10 +447,10 @@ } /* Find the next entry */ ptr = nextptr; - nextptr = REG(ptr + 1, 1); + nextptr = REG(ptr + PCICAP_NEXTPTR, 1); /* Process this entry */ - switch (REG(ptr, 1)) { + switch (REG(ptr + PCICAP_ID, 1)) { case PCIY_PMG: /* PCI power management */ if (cfg-pp.pp_cap == 0) { cfg-pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2); @@ -1040,7 +1040,8 @@ } if (cfg-intpin 0 PCI_INTERRUPT_VALID(cfg-intline)) { -#ifdef __PCI_REROUTE_INTERRUPT +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \ + defined(__arm__) || defined(__alpha__) /* * Try to re-route interrupts. Sometimes the BIOS or * firmware may leave bogus values in these registers. Hope this helps! G'luck, Peter -- Peter Pentchev [EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED] PGP key:http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553 yields falsehood, when appended to its quotation. yields falsehood, when appended to its quotation. pgpSZ7qvqy2jJ.pgp Description: PGP signature
RE: dc0: watchdog timeout and nve0: device timeout
Here's sort of a me-too post: After upgrading STABLE from about a week ago to sources from this morning, all disks connected to PCI cards (Promise PDC20318 SATA150 controller + ITE IT8212F UDMA133 controller) fail to show up during boot. The cards are properly identified, but no disks are found. The only disks found are those hooked up to the on-board ATA controller. Also, after rebooting the machine the BIOS on the Promise SATA card fails to detect the disks. A power-toggle brings the disks back in BIOS, and backing down to the old kernel brings them back in the OS. /Daniel Eriksson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dc0: watchdog timeout and nve0: device timeout
In message: [EMAIL PROTECTED] Peter Pentchev [EMAIL PROTECTED] writes: : On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote: : On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote: : A After updating to STABLE today I'm getting the following message with : A my dc and nve NICs every few seconds. UP, AMD64. A kernel from last : A Thursday was fine. : A : A dc0: watchdog timeout : A nve0: device timeout (4) : : Can you try to backout the code in sys/dev/pci to Thursday? If this : doesn't help, you probably need to do a binary search in this small : timeframe. : : I think I found the problem - the merge was not quite correct, and : the PCI interrupt rerouting was disabled for some reason. : : Warner, is there a reason for hiding the Try to re-route interrupts : code behind an apparently ifdef 0 case? Well, okay, most probably : there is a reason, since you've done it, but... it breaks my re0 card : and it also seems to break Anish's hardware :) I'm pretty sure that's the problem. I thought I'd specifically checked to make sure that I didn't merge this :-( : BTW, the commit message was not quite correct - rev. 1.302 was not : really merged, it's included in my patch here. Also, rev. 1.305 of : pci.c seems to have more than just adding the PCI_FIND_EXTCAP method - : there are a couple of offset fixes that I also included in the patch : while trying to come as close to the -CURRENT code as possible; could : you check if they actually apply to -STABLE? They do. : Anyway, here's a patch that fixes it for me, although most probably : the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if : you want more details, I could help with debugging this - on my : system, the re0 card definitely needs this rerouting. I've posted : some verbose boot output with explanations at : http://people.FreeBSD.org/~roam/pcirouting/ : The patch itself is also there in case it gets munged by the mail : swervers along the way. : : Index: src/sys/dev/pci/pci.c : === : RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v : retrieving revision 1.292.2.6 : diff -u -r1.292.2.6 pci.c : --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 - 1.292.2.6 : +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 - : @@ -428,7 +428,7 @@ : ptrptr = PCIR_CAP_PTR; : break; : case 2: : - ptrptr = 0x14; : + ptrptr = PCIR_CAP_PTR_2; : break; : default: : return; /* no extended capabilities support */ : @@ -447,10 +447,10 @@ : } : /* Find the next entry */ : ptr = nextptr; : - nextptr = REG(ptr + 1, 1); : + nextptr = REG(ptr + PCICAP_NEXTPTR, 1); : : /* Process this entry */ : - switch (REG(ptr, 1)) { : + switch (REG(ptr + PCICAP_ID, 1)) { : case PCIY_PMG: /* PCI power management */ : if (cfg-pp.pp_cap == 0) { : cfg-pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2); : @@ -1040,7 +1040,8 @@ : } : : if (cfg-intpin 0 PCI_INTERRUPT_VALID(cfg-intline)) { : -#ifdef __PCI_REROUTE_INTERRUPT : +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \ : + defined(__arm__) || defined(__alpha__) : /* :* Try to re-route interrupts. Sometimes the BIOS or :* firmware may leave bogus values in these registers. : : Hope this helps! I'm pretty sure that the REROUTE thing is the only one. That shouldn't have been committed, and I thought I'd checked it specifically before the commit, but I just checked what I committed and it slipped by. This fits with the symptoms that I saw my server last night (the only differences between a stable boot and an older stable boot was IRQs). The last part of this patch seems to fix things for me. Warner ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]