Re: [patch 2/4] forcedeth: fix MAC address detection on network card (regression in 2.6.23)

2008-02-05 Thread Ayaz Abdulla



Jeff Garzik wrote:

Ayaz Abdulla wrote:


Andrew Morton wrote:


On Tue, 05 Feb 2008 13:20:59 -0500 Jeff Garzik <[EMAIL PROTECTED]> wrote:



Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>



NAK - this fixes one set of users, and breaks a working set of users.

Need to add DMI check for the specific motherboard 
(dmi_check_system), and flip flag according to success/failure of 
that check.




OK :)  I added the above to the changelog for next time.

You guys can hide, but this patch isn't going away!



I believe Michael determined that a newer BIOS fixes this issue.




That's a solution that makes vendors happy... but we still have to deal 
with it in Linux.  There are plenty of the old broken BIOS still out in 
the field...


Jeff




Michael, can you provide which BIOS version had this issue and which 
version fixed the issue?




---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/4] forcedeth: fix MAC address detection on network card (regression in 2.6.23)

2008-02-05 Thread Ayaz Abdulla



Andrew Morton wrote:

On Tue, 05 Feb 2008 13:20:59 -0500 Jeff Garzik <[EMAIL PROTECTED]> wrote:



Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>


NAK - this fixes one set of users, and breaks a working set of users.

Need to add DMI check for the specific motherboard (dmi_check_system), 
and flip flag according to success/failure of that check.



OK :)  I added the above to the changelog for next time.

You guys can hide, but this patch isn't going away!


I believe Michael determined that a newer BIOS fixes this issue.

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/4] forcedeth: fix MAC address detection on network card (regression in 2.6.23)

2008-02-05 Thread Ayaz Abdulla



Jeff Garzik wrote:

Ayaz Abdulla wrote:


Andrew Morton wrote:


On Tue, 05 Feb 2008 13:20:59 -0500 Jeff Garzik [EMAIL PROTECTED] wrote:



Signed-off-by: Andrew Morton [EMAIL PROTECTED]



NAK - this fixes one set of users, and breaks a working set of users.

Need to add DMI check for the specific motherboard 
(dmi_check_system), and flip flag according to success/failure of 
that check.




OK :)  I added the above to the changelog for next time.

You guys can hide, but this patch isn't going away!



I believe Michael determined that a newer BIOS fixes this issue.




That's a solution that makes vendors happy... but we still have to deal 
with it in Linux.  There are plenty of the old broken BIOS still out in 
the field...


Jeff




Michael, can you provide which BIOS version had this issue and which 
version fixed the issue?




---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/4] forcedeth: fix MAC address detection on network card (regression in 2.6.23)

2008-02-05 Thread Ayaz Abdulla



Andrew Morton wrote:

On Tue, 05 Feb 2008 13:20:59 -0500 Jeff Garzik [EMAIL PROTECTED] wrote:



Signed-off-by: Andrew Morton [EMAIL PROTECTED]


NAK - this fixes one set of users, and breaks a working set of users.

Need to add DMI check for the specific motherboard (dmi_check_system), 
and flip flag according to success/failure of that check.



OK :)  I added the above to the changelog for next time.

You guys can hide, but this patch isn't going away!


I believe Michael determined that a newer BIOS fixes this issue.

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22.2

2007-08-10 Thread Ayaz Abdulla
Yes, you are right. Copy and paste error. I have attached a patch which 
will fix this issue.


Thanks for catching it.
Ayaz

Signed-off-by: Ayaz Abdulla <[EMAIL PROTECTED]>


Prakash Punnoor wrote:

Hi,

I just noticed that PHY_OUI_VITESSE == PHY_OUI_REALTEK. Is that really 
intentional?




diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 42ba1c0..a361dba 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -550,6 +550,8 @@ union ring_type {
/* PHY defines */
#define PHY_OUI_MARVELL 0x5043
#define PHY_OUI_CICADA  0x03f1
+#define PHY_OUI_VITESSE0x01c1
+#define PHY_OUI_REALTEK0x01c1


- 
(°= =°)

//\ Prakash Punnoor /\\
V_/ \_V
--- old/drivers/net/forcedeth.c 2007-08-09 17:37:12.0 -0400
+++ new/drivers/net/forcedeth.c 2007-08-09 17:37:07.0 -0400
@@ -551,7 +551,7 @@
 #define PHY_OUI_MARVELL0x5043
 #define PHY_OUI_CICADA 0x03f1
 #define PHY_OUI_VITESSE0x01c1
-#define PHY_OUI_REALTEK0x01c1
+#define PHY_OUI_REALTEK0x0732
 #define PHYID1_OUI_MASK0x03ff
 #define PHYID1_OUI_SHFT6
 #define PHYID2_OUI_MASK0xfc00


Re: Linux 2.6.22.2

2007-08-10 Thread Ayaz Abdulla
Yes, you are right. Copy and paste error. I have attached a patch which 
will fix this issue.


Thanks for catching it.
Ayaz

Signed-off-by: Ayaz Abdulla [EMAIL PROTECTED]


Prakash Punnoor wrote:

Hi,

I just noticed that PHY_OUI_VITESSE == PHY_OUI_REALTEK. Is that really 
intentional?




diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 42ba1c0..a361dba 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -550,6 +550,8 @@ union ring_type {
/* PHY defines */
#define PHY_OUI_MARVELL 0x5043
#define PHY_OUI_CICADA  0x03f1
+#define PHY_OUI_VITESSE0x01c1
+#define PHY_OUI_REALTEK0x01c1


- 
(°= =°)

//\ Prakash Punnoor /\\
V_/ \_V
--- old/drivers/net/forcedeth.c 2007-08-09 17:37:12.0 -0400
+++ new/drivers/net/forcedeth.c 2007-08-09 17:37:07.0 -0400
@@ -551,7 +551,7 @@
 #define PHY_OUI_MARVELL0x5043
 #define PHY_OUI_CICADA 0x03f1
 #define PHY_OUI_VITESSE0x01c1
-#define PHY_OUI_REALTEK0x01c1
+#define PHY_OUI_REALTEK0x0732
 #define PHYID1_OUI_MASK0x03ff
 #define PHYID1_OUI_SHFT6
 #define PHYID2_OUI_MASK0xfc00


[PATCH 1/2] forcedeth: new device ids in pci_ids.h

2007-07-23 Thread Ayaz Abdulla

This patch contains new device ids for MCP73 chipset.

Signed-Off-By: Ayaz Abdulla <[EMAIL PROTECTED]>

--- old/include/linux/pci_ids.h 2007-07-22 18:57:26.0 -0400
+++ new/include/linux/pci_ids.h 2007-07-22 18:57:11.0 -0400
@@ -1223,6 +1223,10 @@
 #define PCI_DEVICE_ID_NVIDIA_NVENET_25  0x054D
 #define PCI_DEVICE_ID_NVIDIA_NVENET_26  0x054E
 #define PCI_DEVICE_ID_NVIDIA_NVENET_27  0x054F
+#define PCI_DEVICE_ID_NVIDIA_NVENET_28  0x07DC
+#define PCI_DEVICE_ID_NVIDIA_NVENET_29  0x07DD
+#define PCI_DEVICE_ID_NVIDIA_NVENET_30  0x07DE
+#define PCI_DEVICE_ID_NVIDIA_NVENET_31  0x07DF
 #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP67_IDE   0x0560
 #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP73_IDE   0x056C
 #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP77_IDE   0x0759


[PATCH 2/2] forcedeth: mcp73 device addition

2007-07-23 Thread Ayaz Abdulla

This patch contains new device settings for MCP73 chipset.

Signed-Off-By: Ayaz Abdulla <[EMAIL PROTECTED]>

--- old/drivers/net/forcedeth.c 2007-07-22 19:02:41.0 -0400
+++ new/drivers/net/forcedeth.c 2007-07-22 19:31:56.0 -0400
@@ -5550,6 +5550,22 @@
PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_27),
.driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
},
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_28),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_29),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_30),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_31),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
{0,},
 };
 


[PATCH 2/2] forcedeth: mcp73 device addition

2007-07-23 Thread Ayaz Abdulla

This patch contains new device settings for MCP73 chipset.

Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]

--- old/drivers/net/forcedeth.c 2007-07-22 19:02:41.0 -0400
+++ new/drivers/net/forcedeth.c 2007-07-22 19:31:56.0 -0400
@@ -5550,6 +5550,22 @@
PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_27),
.driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
},
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_28),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_29),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_30),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
+   {   /* MCP73 Ethernet Controller */
+   PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, 
PCI_DEVICE_ID_NVIDIA_NVENET_31),
+   .driver_data = 
DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER|DEV_HAS_HIGH_DMA|DEV_HAS_POWER_CNTRL|DEV_HAS_MSI|DEV_HAS_PAUSEFRAME_TX|DEV_HAS_STATISTICS_V2|DEV_HAS_TEST_EXTENDED|DEV_HAS_MGMT_UNIT,
+   },
{0,},
 };
 


[PATCH 1/2] forcedeth: new device ids in pci_ids.h

2007-07-23 Thread Ayaz Abdulla

This patch contains new device ids for MCP73 chipset.

Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]

--- old/include/linux/pci_ids.h 2007-07-22 18:57:26.0 -0400
+++ new/include/linux/pci_ids.h 2007-07-22 18:57:11.0 -0400
@@ -1223,6 +1223,10 @@
 #define PCI_DEVICE_ID_NVIDIA_NVENET_25  0x054D
 #define PCI_DEVICE_ID_NVIDIA_NVENET_26  0x054E
 #define PCI_DEVICE_ID_NVIDIA_NVENET_27  0x054F
+#define PCI_DEVICE_ID_NVIDIA_NVENET_28  0x07DC
+#define PCI_DEVICE_ID_NVIDIA_NVENET_29  0x07DD
+#define PCI_DEVICE_ID_NVIDIA_NVENET_30  0x07DE
+#define PCI_DEVICE_ID_NVIDIA_NVENET_31  0x07DF
 #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP67_IDE   0x0560
 #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP73_IDE   0x056C
 #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP77_IDE   0x0759


Re: Linux 2.6.21-rc5

2007-03-26 Thread Ayaz Abdulla
This issue might be resolved with the patch provided in the following 
bug report: http://bugzilla.kernel.org/show_bug.cgi?id=8058


Please try out the patch in the bug report without your patch and see if 
the issue reproduces.


Ayaz


Ingo Molnar wrote:

* Linus Torvalds <[EMAIL PROTECTED]> wrote:


There's various fixes here, ranging from some architecture updates 
(ia64, ARM, MIPS, SH, Sparc64) to KVM, networking and network drivers.



here's a new v2.6.20 -> v2.6.21 forcedeth.c regression:

in the last week or so i've been seeing sporadic under-load forcedeth.c 
crashes (see the full oops further below):


 eth1: too many iterations (6) in nv_nic_irq.
 Unable to handle kernel NULL pointer dereference at 0088 RIP: 
 [] nv_tx_done+0xf4/0x1cf


this is line 1906 of drivers/net/forcedeth.c:

np->stats.tx_bytes += np->get_tx_ctx->skb->len;

struct sk_buff's len field is at offset 88, so np->get_tx_ctx->skb is 
NULL. That is an 'impossible' scenario for tx descriptors here - the tx 
ring descriptors are always set up with a valid skb (and a valid dma 
address), and their completion is serialized via np->lock.


these crashes are almost instant on the .21-rc5-rt kernel, but extremely 
sporadic on the upstream kernel and needed very high networking loads to 
trigger. Today i found a good way to trigger it almost instantly on 
upstream kernels too: apply the debug patch attached further below and 
do:


echo 100 > /proc/sys/kernel/panic

that will inject 100 artificial 'too many iterations' failures and 
provokes a TX timeout - which TX timeout will crash. (i've used a 
dual-core Athlon64 system in this test)


my first quick guess was to extend np->priv locking to the whole of 
nv_start_xmit/nv_start_xmit_optimized - while that appeared to make the 
crash a bit less likely, it did not prevent it. So there must be some 
other, more fundamental problem be left as well. At first glance the SMP 
locking looks OK, so maybe the ring indices are messed up somehow and we 
got into a 'ring head bites the tail' scenario?


i can provide more info if needed.

Ingo

-->
eth1: too many iterations (6) in nv_nic_irq.
Unable to handle kernel NULL pointer dereference at 0088 RIP: 
 [] nv_tx_done+0xf4/0x1cf
PGD 34d03067 PUD 34d02067 PMD 0 
Oops:  [1] PREEMPT SMP 
CPU 1 
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.21-rc5 #8
RIP: 0010:[]  [] nv_tx_done+0xf4/0x1cf
RSP: 0018:81003ff6be40  EFLAGS: 00010206
RAX:  RBX: 810002e26700 RCX: 0001
RDX: 0042 RSI: 3ef00cbe RDI: 81003fbeb070
RBP: 81003ff6be60 R08: 810002e26a00 R09: 0003
R10: 81003ff4e100 R11: 810001e283f8 R12: 3ef00cbe
R13: 810002e26000 R14: 810002e28fc0 R15: 
FS:  2b6cb57f1db0() GS:81003ff4ad40() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0088 CR3: 34c87000 CR4: 06e0
Process swapper (pid: 0, threadinfo 81003ff64000, task 81003ff4e100)
Stack:  810002e26700 0032 c201a000 810002e26000
 81003ff6bea0 80406dae 810002e26700 810002e26700
 810002e26000 00ff c201a000 80749080
Call Trace:
   [] nv_nic_irq+0x76/0x261
 [] nv_do_nic_poll+0x200/0x284
 [] nv_do_nic_poll+0x0/0x284
 [] run_timer_softirq+0x167/0x1dd
 [] __do_softirq+0x5b/0xc9
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x31/0x84
 [] irq_exit+0x3f/0x50
 [] smp_apic_timer_interrupt+0x49/0x5b
 [] default_idle+0x0/0x44
 [] apic_timer_interrupt+0x66/0x70
   [] default_idle+0x2f/0x44
 [] enter_idle+0x22/0x24
 [] cpu_idle+0x91/0xd4
 [] start_secondary+0x2e3/0x2f5

---
 drivers/net/forcedeth.c |   20 
 1 file changed, 20 insertions(+)

Index: linux/drivers/net/forcedeth.c
===
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -2908,6 +2908,10 @@ static irqreturn_t nv_nic_irq(int foo, v
spin_unlock(>lock);
break;
}
+   if (panic_timeout > 0) {
+   panic_timeout--;
+   i = max_interrupt_work+1;
+   }
if (unlikely(i > max_interrupt_work)) {
spin_lock(>lock);
/* disable interrupts on the nic */
@@ -3026,6 +3030,10 @@ static irqreturn_t nv_nic_irq_optimized(
break;
}
 
+		if (panic_timeout > 0) {

+   panic_timeout--;
+   i = max_interrupt_work+1;
+   }
if (unlikely(i > max_interrupt_work)) {
spin_lock(>lock);
/* disable interrupts on the nic */
@@ -3076,6 +3084,10 @@ static irqreturn_t nv_nic_irq_tx(int foo

Re: Linux 2.6.21-rc5

2007-03-26 Thread Ayaz Abdulla
This issue might be resolved with the patch provided in the following 
bug report: http://bugzilla.kernel.org/show_bug.cgi?id=8058


Please try out the patch in the bug report without your patch and see if 
the issue reproduces.


Ayaz


Ingo Molnar wrote:

* Linus Torvalds [EMAIL PROTECTED] wrote:


There's various fixes here, ranging from some architecture updates 
(ia64, ARM, MIPS, SH, Sparc64) to KVM, networking and network drivers.



here's a new v2.6.20 - v2.6.21 forcedeth.c regression:

in the last week or so i've been seeing sporadic under-load forcedeth.c 
crashes (see the full oops further below):


 eth1: too many iterations (6) in nv_nic_irq.
 Unable to handle kernel NULL pointer dereference at 0088 RIP: 
 [80404587] nv_tx_done+0xf4/0x1cf


this is line 1906 of drivers/net/forcedeth.c:

np-stats.tx_bytes += np-get_tx_ctx-skb-len;

struct sk_buff's len field is at offset 88, so np-get_tx_ctx-skb is 
NULL. That is an 'impossible' scenario for tx descriptors here - the tx 
ring descriptors are always set up with a valid skb (and a valid dma 
address), and their completion is serialized via np-lock.


these crashes are almost instant on the .21-rc5-rt kernel, but extremely 
sporadic on the upstream kernel and needed very high networking loads to 
trigger. Today i found a good way to trigger it almost instantly on 
upstream kernels too: apply the debug patch attached further below and 
do:


echo 100  /proc/sys/kernel/panic

that will inject 100 artificial 'too many iterations' failures and 
provokes a TX timeout - which TX timeout will crash. (i've used a 
dual-core Athlon64 system in this test)


my first quick guess was to extend np-priv locking to the whole of 
nv_start_xmit/nv_start_xmit_optimized - while that appeared to make the 
crash a bit less likely, it did not prevent it. So there must be some 
other, more fundamental problem be left as well. At first glance the SMP 
locking looks OK, so maybe the ring indices are messed up somehow and we 
got into a 'ring head bites the tail' scenario?


i can provide more info if needed.

Ingo

--
eth1: too many iterations (6) in nv_nic_irq.
Unable to handle kernel NULL pointer dereference at 0088 RIP: 
 [80404587] nv_tx_done+0xf4/0x1cf
PGD 34d03067 PUD 34d02067 PMD 0 
Oops:  [1] PREEMPT SMP 
CPU 1 
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.21-rc5 #8
RIP: 0010:[80404587]  [80404587] nv_tx_done+0xf4/0x1cf
RSP: 0018:81003ff6be40  EFLAGS: 00010206
RAX:  RBX: 810002e26700 RCX: 0001
RDX: 0042 RSI: 3ef00cbe RDI: 81003fbeb070
RBP: 81003ff6be60 R08: 810002e26a00 R09: 0003
R10: 81003ff4e100 R11: 810001e283f8 R12: 3ef00cbe
R13: 810002e26000 R14: 810002e28fc0 R15: 
FS:  2b6cb57f1db0() GS:81003ff4ad40() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0088 CR3: 34c87000 CR4: 06e0
Process swapper (pid: 0, threadinfo 81003ff64000, task 81003ff4e100)
Stack:  810002e26700 0032 c201a000 810002e26000
 81003ff6bea0 80406dae 810002e26700 810002e26700
 810002e26000 00ff c201a000 80749080
Call Trace:
 IRQ  [80406dae] nv_nic_irq+0x76/0x261
 [8040961e] nv_do_nic_poll+0x200/0x284
 [8040941e] nv_do_nic_poll+0x0/0x284
 [80241995] run_timer_softirq+0x167/0x1dd
 [8023de45] __do_softirq+0x5b/0xc9
 [8020af0c] call_softirq+0x1c/0x28
 [8020c2b4] do_softirq+0x31/0x84
 [8023db16] irq_exit+0x3f/0x50
 [802190c2] smp_apic_timer_interrupt+0x49/0x5b
 [802087fb] default_idle+0x0/0x44
 [8020a9b6] apic_timer_interrupt+0x66/0x70
 EOI  [8020882a] default_idle+0x2f/0x44
 [8020804c] enter_idle+0x22/0x24
 [802088d0] cpu_idle+0x91/0xd4
 [80218572] start_secondary+0x2e3/0x2f5

---
 drivers/net/forcedeth.c |   20 
 1 file changed, 20 insertions(+)

Index: linux/drivers/net/forcedeth.c
===
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -2908,6 +2908,10 @@ static irqreturn_t nv_nic_irq(int foo, v
spin_unlock(np-lock);
break;
}
+   if (panic_timeout  0) {
+   panic_timeout--;
+   i = max_interrupt_work+1;
+   }
if (unlikely(i  max_interrupt_work)) {
spin_lock(np-lock);
/* disable interrupts on the nic */
@@ -3026,6 +3030,10 @@ static irqreturn_t nv_nic_irq_optimized(
break;
}
 
+		if (panic_timeout  0) {

+   panic_timeout--;
+

Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-19 Thread Ayaz Abdulla



Robert Hancock wrote:

Ayaz Abdulla wrote:



For all those who are having issues, please try out the attached patch.

Ayaz


--- 

This email message is for the sole use of the intended recipient(s) 
and may contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact 
the sender by

reply email and destroy all copies of the original message.
--- 






--- orig/drivers/net/forcedeth.c2007-02-08 21:41:59.0 -0500
+++ new/drivers/net/forcedeth.c2007-02-08 21:44:53.0 -0500
@@ -3104,13 +3104,17 @@
 struct fe_priv *np = netdev_priv(dev);
 u8 __iomem *base = get_hwbase(dev);
 unsigned long flags;
+u32 retcode;
 
-if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)

+if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) {
 pkts = nv_rx_process(dev, limit);
-else
+retcode = nv_alloc_rx(dev);
+} else {
 pkts = nv_rx_process_optimized(dev, limit);
+retcode = nv_alloc_rx_optimized(dev);
+}
 
-if (nv_alloc_rx(dev)) {

+if (retcode) {
 spin_lock_irqsave(>lock, flags);
 if (!np->in_shutdown)
 mod_timer(>oom_kick, jiffies + OOM_REFILL);



Did anyone push this patch into mainline? forcedeth on 2.6.20-git14 is 
still completely broken without this patch.




I have submitted the patch to netdev mailing list.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-19 Thread Ayaz Abdulla



Robert Hancock wrote:

Ayaz Abdulla wrote:



For all those who are having issues, please try out the attached patch.

Ayaz


--- 

This email message is for the sole use of the intended recipient(s) 
and may contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact 
the sender by

reply email and destroy all copies of the original message.
--- 






--- orig/drivers/net/forcedeth.c2007-02-08 21:41:59.0 -0500
+++ new/drivers/net/forcedeth.c2007-02-08 21:44:53.0 -0500
@@ -3104,13 +3104,17 @@
 struct fe_priv *np = netdev_priv(dev);
 u8 __iomem *base = get_hwbase(dev);
 unsigned long flags;
+u32 retcode;
 
-if (np-desc_ver == DESC_VER_1 || np-desc_ver == DESC_VER_2)

+if (np-desc_ver == DESC_VER_1 || np-desc_ver == DESC_VER_2) {
 pkts = nv_rx_process(dev, limit);
-else
+retcode = nv_alloc_rx(dev);
+} else {
 pkts = nv_rx_process_optimized(dev, limit);
+retcode = nv_alloc_rx_optimized(dev);
+}
 
-if (nv_alloc_rx(dev)) {

+if (retcode) {
 spin_lock_irqsave(np-lock, flags);
 if (!np-in_shutdown)
 mod_timer(np-oom_kick, jiffies + OOM_REFILL);



Did anyone push this patch into mainline? forcedeth on 2.6.20-git14 is 
still completely broken without this patch.




I have submitted the patch to netdev mailing list.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-08 Thread Ayaz Abdulla

David Ford wrote:
On 2/5/07, *Andrew Morton* <[EMAIL PROTECTED] 
> wrote:


On Sun, 04 Feb 2007 23:48:33 -0600 Robert Hancock <[EMAIL PROTECTED]
> wrote:

 > Andrew Morton wrote:
 > > On Sun, 04 Feb 2007 23:13:09 -0600 Robert Hancock <
[EMAIL PROTECTED] > wrote:
 > >
 > >> Something's busted with forcedeth in 2.6.20-rc6-mm3 for me
relative to
 > >> 2.6.20-rc6. There's no errors in dmesg, but it seems no
packets ever get
 > >> received and so the machine can't get an IP address. I tried
reverting
 > >> all the -mm changes to drivers/net/forcedeth.c, which didn't
help. The
 > >> network controller shares an IRQ with the USB OHCI controller
which is
 > >> receiving interrupts, so it doesn't seem like an interrupt routing
 > >> problem, though I suppose something wierd could be happening
there.
 > >>
 > >> This is on an Asus A8N-SLI Deluxe (CK804 chipset) on x86_64.
 > >>
 > >> Any suggestions on how to debug/what to try reverting to see
what's
 > >> causing this?
 > >
 > > There are many forcedeth changes in git-netdev-all.patch.  Can you
 > > try reverting drivers/net/forcedeth.c back to the unpatched version
 > > from 2.6.20-rc6?
 > >
 > > Thanks.
 > >
 >
 > That's essentially what I did, it didn't appear to help. I assume
the
 > problem must lie elsewhere..
 >

doh, I missed that.

It's presumably not the driver and nobody else seems to be hitting
this, so
it must be something peculiar to your setup.  But I don't know what it
might be, sorry.



Actually it has been reported by several other people here including 
myself but it seems to have been overlooked here ;)


See the messages with forcedeth in the subject line over the past few 
weeks.


I put 2.6.20-gentoo on my machine this weekend with debug printks 
enabled and right now I have yet to lose connectivity -- going on ~20 
hours worth.


Previously I would lose connectivity within minutes of booting up.  I 
had a script set up that detected the ping loss of a gateway and would 
restart both interfaces (dual onboard nics).


Tonight I will disable the debug printks and see if the system remains 
online.  There was a big patch applied to forcedeth for 2.6.20, 
previously I was having these issues for several of the -19 series.


David


For all those who are having issues, please try out the attached patch.

Ayaz


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--- orig/drivers/net/forcedeth.c2007-02-08 21:41:59.0 -0500
+++ new/drivers/net/forcedeth.c 2007-02-08 21:44:53.0 -0500
@@ -3104,13 +3104,17 @@
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
unsigned long flags;
+   u32 retcode;
 
-   if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+   if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) {
pkts = nv_rx_process(dev, limit);
-   else
+   retcode = nv_alloc_rx(dev);
+   } else {
pkts = nv_rx_process_optimized(dev, limit);
+   retcode = nv_alloc_rx_optimized(dev);
+   }
 
-   if (nv_alloc_rx(dev)) {
+   if (retcode) {
spin_lock_irqsave(>lock, flags);
if (!np->in_shutdown)
mod_timer(>oom_kick, jiffies + OOM_REFILL);


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-08 Thread Ayaz Abdulla

David Ford wrote:
On 2/5/07, *Andrew Morton* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


On Sun, 04 Feb 2007 23:48:33 -0600 Robert Hancock [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:

  Andrew Morton wrote:
   On Sun, 04 Feb 2007 23:13:09 -0600 Robert Hancock 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
  
   Something's busted with forcedeth in 2.6.20-rc6-mm3 for me
relative to
   2.6.20-rc6. There's no errors in dmesg, but it seems no
packets ever get
   received and so the machine can't get an IP address. I tried
reverting
   all the -mm changes to drivers/net/forcedeth.c, which didn't
help. The
   network controller shares an IRQ with the USB OHCI controller
which is
   receiving interrupts, so it doesn't seem like an interrupt routing
   problem, though I suppose something wierd could be happening
there.
  
   This is on an Asus A8N-SLI Deluxe (CK804 chipset) on x86_64.
  
   Any suggestions on how to debug/what to try reverting to see
what's
   causing this?
  
   There are many forcedeth changes in git-netdev-all.patch.  Can you
   try reverting drivers/net/forcedeth.c back to the unpatched version
   from 2.6.20-rc6?
  
   Thanks.
  
 
  That's essentially what I did, it didn't appear to help. I assume
the
  problem must lie elsewhere..
 

doh, I missed that.

It's presumably not the driver and nobody else seems to be hitting
this, so
it must be something peculiar to your setup.  But I don't know what it
might be, sorry.



Actually it has been reported by several other people here including 
myself but it seems to have been overlooked here ;)


See the messages with forcedeth in the subject line over the past few 
weeks.


I put 2.6.20-gentoo on my machine this weekend with debug printks 
enabled and right now I have yet to lose connectivity -- going on ~20 
hours worth.


Previously I would lose connectivity within minutes of booting up.  I 
had a script set up that detected the ping loss of a gateway and would 
restart both interfaces (dual onboard nics).


Tonight I will disable the debug printks and see if the system remains 
online.  There was a big patch applied to forcedeth for 2.6.20, 
previously I was having these issues for several of the -19 series.


David


For all those who are having issues, please try out the attached patch.

Ayaz


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--- orig/drivers/net/forcedeth.c2007-02-08 21:41:59.0 -0500
+++ new/drivers/net/forcedeth.c 2007-02-08 21:44:53.0 -0500
@@ -3104,13 +3104,17 @@
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
unsigned long flags;
+   u32 retcode;
 
-   if (np-desc_ver == DESC_VER_1 || np-desc_ver == DESC_VER_2)
+   if (np-desc_ver == DESC_VER_1 || np-desc_ver == DESC_VER_2) {
pkts = nv_rx_process(dev, limit);
-   else
+   retcode = nv_alloc_rx(dev);
+   } else {
pkts = nv_rx_process_optimized(dev, limit);
+   retcode = nv_alloc_rx_optimized(dev);
+   }
 
-   if (nv_alloc_rx(dev)) {
+   if (retcode) {
spin_lock_irqsave(np-lock, flags);
if (!np-in_shutdown)
mod_timer(np-oom_kick, jiffies + OOM_REFILL);