Hi,
Carl-Daniel Hailfinger schrieb:
> Carl-Daniel Hailfinger schrieb:
>
>>Carl-Daniel Hailfinger schrieb:
>>
>>
>>>after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21
>>>card (sky2 says it is a "Yukon-EC (0xb6) rev 1"), the card appears
>>>dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board.
>>
>>I have now added a hard reset routine to the tx timeout
>>path and hope it won't kill my machine.
>
>
> Apologies for mangled whitespace, this is just a rough cut'n'paste.
> --- linux-2.6.15/drivers/net/sky2.c.orig 2006-01-21 16:00:15.000000000
> +0100
> +++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.000000000 +0100
> @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2
> return 0;
> }
>
> +static int sky2_reset(struct sky2_hw *hw);
> /*
> * Interrupt from PHY are handled outside of interrupt context
> * because accessing phy registers requires spin wait which might
> @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d
> if (netif_msg_timer(sky2))
> printk(KERN_ERR PFX "%s: tx timeout\n", dev->name);
>
> + if (0) {
> sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP);
> sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
>
> @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d
>
> sky2_qset(hw, txq);
> sky2_prefetch_init(hw, txq, sky2->tx_le_map, TX_RING_SIZE - 1);
> + } else {
> + printk(KERN_ERR PFX "%s: recovering the HARD way...\n", dev->name);
> + sky2_down(dev);
> + sky2_reset(hw);
> + sky2_up(dev);
> + }
> }
>
>
> And everytime the kernel throws this message, I run the following
> script:
>
> #!/bin/bash
> deadinterface=`dmesg|grep HARD|tail -1|sed "s/.*sky2 //;s/:.*//"`
> ip l s $deadinterface down
> ip l s $deadinterface up
>
> After that, everything continues to work until the next tx timeout
> happens, and then the script again saves the day.
>
> More results about the circumstances of this bug: It seems that
> it will only trigger under LOW load. As long as I keep the interface
> busy, it will have no problems at all.
OK, more info about the circumstances of the bug.
- happens with sky2 0.11 and 0.13
- with low load (<100 kB/s) it triggers after 12 hours and then
approx. every 50 minutes
- with medium load (100-1200 kB/s) it triggers after 30 minutes
and then approx. every 70 minutes
- with high RX load (9-12 MB/s) it triggers every 8 hours
- with high TX load (9-12 MB/s) I can't get it to trigger
- with stock tx_timeout handler, it will stay dead and no interrupts
are received from the nic once it hangs
- simply taking the interface down and up again doesn't help
- with my modified tx_timeout handler, taking the interface down and
up again after the timeout helps
- with stock tx_timeout handler, I have to unload and reload the
module to fix up the card
- general pattern seems to be medium interrupt load -> instability
- ah yes, and this is a production machine at a slightly remote
location. Silly me.
If you want me to test any patch, tell me. It can only get better.
Regards,
Carl-Daniel
--
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html