Re: dc0: watchdog timeout and nve0: device timeout

2006-01-31 Thread Gleb Smirnoff
On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
A After updating to STABLE today I'm getting the following message with 
A my dc and nve NICs every few seconds.  UP, AMD64.  A kernel from last 
A Thursday was fine.
A 
A dc0: watchdog timeout
A nve0: device timeout (4)

Can you try to backout the code in sys/dev/pci to Thursday? If this
doesn't help, you probably need to do a binary search in this small
timeframe.

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dc0: watchdog timeout and nve0: device timeout

2006-01-31 Thread Peter Pentchev
On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
 On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
 A After updating to STABLE today I'm getting the following message with 
 A my dc and nve NICs every few seconds.  UP, AMD64.  A kernel from last 
 A Thursday was fine.
 A 
 A dc0: watchdog timeout
 A nve0: device timeout (4)
 
 Can you try to backout the code in sys/dev/pci to Thursday? If this
 doesn't help, you probably need to do a binary search in this small
 timeframe.

I think I found the problem - the merge was not quite correct, and
the PCI interrupt rerouting was disabled for some reason.

Warner, is there a reason for hiding the Try to re-route interrupts
code behind an apparently ifdef 0 case?  Well, okay, most probably
there is a reason, since you've done it, but... it breaks my re0 card
and it also seems to break Anish's hardware :)

BTW, the commit message was not quite correct - rev. 1.302 was not
really merged, it's included in my patch here.  Also, rev. 1.305 of
pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
there are a couple of offset fixes that I also included in the patch
while trying to come as close to the -CURRENT code as possible; could
you check if they actually apply to -STABLE?

Anyway, here's a patch that fixes it for me, although most probably
the __PCI_REROUTE_INTERRUPT chunk should be sufficient.  Warner, if
you want more details, I could help with debugging this - on my
system, the re0 card definitely needs this rerouting.  I've posted
some verbose boot output with explanations at
http://people.FreeBSD.org/~roam/pcirouting/
The patch itself is also there in case it gets munged by the mail
swervers along the way.

Index: src/sys/dev/pci/pci.c
===
RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
retrieving revision 1.292.2.6
diff -u -r1.292.2.6 pci.c
--- src/sys/dev/pci/pci.c   30 Jan 2006 18:42:10 -  1.292.2.6
+++ src/sys/dev/pci/pci.c   31 Jan 2006 10:57:32 -
@@ -428,7 +428,7 @@
ptrptr = PCIR_CAP_PTR;
break;
case 2:
-   ptrptr = 0x14;
+   ptrptr = PCIR_CAP_PTR_2;
break;
default:
return; /* no extended capabilities support */
@@ -447,10 +447,10 @@
}
/* Find the next entry */
ptr = nextptr;
-   nextptr = REG(ptr + 1, 1);
+   nextptr = REG(ptr + PCICAP_NEXTPTR, 1);
 
/* Process this entry */
-   switch (REG(ptr, 1)) {
+   switch (REG(ptr + PCICAP_ID, 1)) {
case PCIY_PMG:  /* PCI power management */
if (cfg-pp.pp_cap == 0) {
cfg-pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
@@ -1040,7 +1040,8 @@
}
 
if (cfg-intpin  0  PCI_INTERRUPT_VALID(cfg-intline)) {
-#ifdef __PCI_REROUTE_INTERRUPT
+#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
+   defined(__arm__) || defined(__alpha__)
/*
 * Try to re-route interrupts. Sometimes the BIOS or
 * firmware may leave bogus values in these registers.

Hope this helps!

G'luck,
Peter

-- 
Peter Pentchev  [EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED]
PGP key:http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
yields falsehood, when appended to its quotation. yields falsehood, when 
appended to its quotation.


pgpSZ7qvqy2jJ.pgp
Description: PGP signature


RE: dc0: watchdog timeout and nve0: device timeout

2006-01-31 Thread Daniel Eriksson

Here's sort of a me-too post:

After upgrading STABLE from about a week ago to sources from this
morning, all disks connected to PCI cards (Promise PDC20318 SATA150
controller + ITE IT8212F UDMA133 controller) fail to show up during
boot. The cards are properly identified, but no disks are found. The
only disks found are those hooked up to the on-board ATA controller.

Also, after rebooting the machine the BIOS on the Promise SATA card
fails to detect the disks. A power-toggle brings the disks back in BIOS,
and backing down to the old kernel brings them back in the OS.

/Daniel Eriksson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dc0: watchdog timeout and nve0: device timeout

2006-01-31 Thread M. Warner Losh
In message: [EMAIL PROTECTED]
Peter Pentchev [EMAIL PROTECTED] writes:
: On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
:  On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
:  A After updating to STABLE today I'm getting the following message with 
:  A my dc and nve NICs every few seconds.  UP, AMD64.  A kernel from last 
:  A Thursday was fine.
:  A 
:  A dc0: watchdog timeout
:  A nve0: device timeout (4)
:  
:  Can you try to backout the code in sys/dev/pci to Thursday? If this
:  doesn't help, you probably need to do a binary search in this small
:  timeframe.
: 
: I think I found the problem - the merge was not quite correct, and
: the PCI interrupt rerouting was disabled for some reason.
: 
: Warner, is there a reason for hiding the Try to re-route interrupts
: code behind an apparently ifdef 0 case?  Well, okay, most probably
: there is a reason, since you've done it, but... it breaks my re0 card
: and it also seems to break Anish's hardware :)

I'm pretty sure that's the problem.  I thought I'd specifically
checked to make sure that I didn't merge this :-(

: BTW, the commit message was not quite correct - rev. 1.302 was not
: really merged, it's included in my patch here.  Also, rev. 1.305 of
: pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
: there are a couple of offset fixes that I also included in the patch
: while trying to come as close to the -CURRENT code as possible; could
: you check if they actually apply to -STABLE?

They do.

: Anyway, here's a patch that fixes it for me, although most probably
: the __PCI_REROUTE_INTERRUPT chunk should be sufficient.  Warner, if
: you want more details, I could help with debugging this - on my
: system, the re0 card definitely needs this rerouting.  I've posted
: some verbose boot output with explanations at
: http://people.FreeBSD.org/~roam/pcirouting/
: The patch itself is also there in case it gets munged by the mail
: swervers along the way.
: 
: Index: src/sys/dev/pci/pci.c
: ===
: RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
: retrieving revision 1.292.2.6
: diff -u -r1.292.2.6 pci.c
: --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 -  1.292.2.6
: +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 -
: @@ -428,7 +428,7 @@
:   ptrptr = PCIR_CAP_PTR;
:   break;
:   case 2:
: - ptrptr = 0x14;
: + ptrptr = PCIR_CAP_PTR_2;
:   break;
:   default:
:   return; /* no extended capabilities support */
: @@ -447,10 +447,10 @@
:   }
:   /* Find the next entry */
:   ptr = nextptr;
: - nextptr = REG(ptr + 1, 1);
: + nextptr = REG(ptr + PCICAP_NEXTPTR, 1);
:  
:   /* Process this entry */
: - switch (REG(ptr, 1)) {
: + switch (REG(ptr + PCICAP_ID, 1)) {
:   case PCIY_PMG:  /* PCI power management */
:   if (cfg-pp.pp_cap == 0) {
:   cfg-pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
: @@ -1040,7 +1040,8 @@
:   }
:  
:   if (cfg-intpin  0  PCI_INTERRUPT_VALID(cfg-intline)) {
: -#ifdef __PCI_REROUTE_INTERRUPT
: +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
: + defined(__arm__) || defined(__alpha__)
:   /*
:* Try to re-route interrupts. Sometimes the BIOS or
:* firmware may leave bogus values in these registers.
: 
: Hope this helps!

I'm pretty sure that the REROUTE thing is the only one.  That
shouldn't have been committed, and I thought I'd checked it
specifically before the commit, but I just checked what I committed
and it slipped by.  This fits with the symptoms that I saw my server
last night (the only differences between a stable boot and an older
stable boot was IRQs).

The last part of this patch seems to fix things for me.

Warner
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]