Hi, Carl-Daniel Hailfinger schrieb: > Carl-Daniel Hailfinger schrieb: > >>Carl-Daniel Hailfinger schrieb: >> >> >>>after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21 >>>card (sky2 says it is a "Yukon-EC (0xb6) rev 1"), the card appears >>>dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board. >> >>I have now added a hard reset routine to the tx timeout >>path and hope it won't kill my machine. > > > Apologies for mangled whitespace, this is just a rough cut'n'paste. > --- linux-2.6.15/drivers/net/sky2.c.orig 2006-01-21 16:00:15.000000000 > +0100 > +++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.000000000 +0100 > @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2 > return 0; > } > > +static int sky2_reset(struct sky2_hw *hw); > /* > * Interrupt from PHY are handled outside of interrupt context > * because accessing phy registers requires spin wait which might > @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d > if (netif_msg_timer(sky2)) > printk(KERN_ERR PFX "%s: tx timeout\n", dev->name); > > + if (0) { > sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP); > sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET); > > @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d > > sky2_qset(hw, txq); > sky2_prefetch_init(hw, txq, sky2->tx_le_map, TX_RING_SIZE - 1); > + } else { > + printk(KERN_ERR PFX "%s: recovering the HARD way...\n", dev->name); > + sky2_down(dev); > + sky2_reset(hw); > + sky2_up(dev); > + } > } > > > And everytime the kernel throws this message, I run the following > script: > > #!/bin/bash > deadinterface=`dmesg|grep HARD|tail -1|sed "s/.*sky2 //;s/:.*//"` > ip l s $deadinterface down > ip l s $deadinterface up > > After that, everything continues to work until the next tx timeout > happens, and then the script again saves the day. > > More results about the circumstances of this bug: It seems that > it will only trigger under LOW load. As long as I keep the interface > busy, it will have no problems at all.
OK, more info about the circumstances of the bug. - happens with sky2 0.11 and 0.13 - with low load (<100 kB/s) it triggers after 12 hours and then approx. every 50 minutes - with medium load (100-1200 kB/s) it triggers after 30 minutes and then approx. every 70 minutes - with high RX load (9-12 MB/s) it triggers every 8 hours - with high TX load (9-12 MB/s) I can't get it to trigger - with stock tx_timeout handler, it will stay dead and no interrupts are received from the nic once it hangs - simply taking the interface down and up again doesn't help - with my modified tx_timeout handler, taking the interface down and up again after the timeout helps - with stock tx_timeout handler, I have to unload and reload the module to fix up the card - general pattern seems to be medium interrupt load -> instability - ah yes, and this is a production machine at a slightly remote location. Silly me. If you want me to test any patch, tell me. It can only get better. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html