Hi,

Carl-Daniel Hailfinger schrieb:
> Carl-Daniel Hailfinger schrieb:
> 
>>Carl-Daniel Hailfinger schrieb:
>>
>>
>>>after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21
>>>card (sky2 says it is a "Yukon-EC (0xb6) rev 1"), the card appears
>>>dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board.
>>
>>I have now added a hard reset routine to the tx timeout
>>path and hope it won't kill my machine.
> 
> 
> Apologies for mangled whitespace, this is just a rough cut'n'paste.
> --- linux-2.6.15/drivers/net/sky2.c.orig        2006-01-21 16:00:15.000000000 
> +0100
> +++ linux-2.6.15/drivers/net/sky2.c     2006-01-21 14:08:28.000000000 +0100
> @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2
>         return 0;
>  }
> 
> +static int sky2_reset(struct sky2_hw *hw);
>  /*
>   * Interrupt from PHY are handled outside of interrupt context
>   * because accessing phy registers requires spin wait which might
> @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d
>         if (netif_msg_timer(sky2))
>                 printk(KERN_ERR PFX "%s: tx timeout\n", dev->name);
> 
> +       if (0) {
>         sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP);
>         sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
> 
> @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d
> 
>         sky2_qset(hw, txq);
>         sky2_prefetch_init(hw, txq, sky2->tx_le_map, TX_RING_SIZE - 1);
> +       } else {
> +       printk(KERN_ERR PFX "%s: recovering the HARD way...\n", dev->name);
> +       sky2_down(dev);
> +       sky2_reset(hw);
> +       sky2_up(dev);
> +       }
>  }
> 
> 
> And everytime the kernel throws this message, I run the following
> script:
> 
> #!/bin/bash
> deadinterface=`dmesg|grep HARD|tail -1|sed "s/.*sky2 //;s/:.*//"`
> ip l s $deadinterface down
> ip l s $deadinterface up
> 
> After that, everything continues to work until the next tx timeout
> happens, and then the script again saves the day.
> 
> More results about the circumstances of this bug: It seems that
> it will only trigger under LOW load. As long as I keep the interface
> busy, it will have no problems at all.

OK, more info about the circumstances of the bug.
- happens with sky2 0.11 and 0.13
- with low load (<100 kB/s) it triggers after 12 hours and then
  approx. every 50 minutes
- with medium load (100-1200 kB/s) it triggers after 30 minutes
  and then approx. every 70 minutes
- with high RX load (9-12 MB/s) it triggers every 8 hours
- with high TX load (9-12 MB/s) I can't get it to trigger
- with stock tx_timeout handler, it will stay dead and no interrupts
  are received from the nic once it hangs
- simply taking the interface down and up again doesn't help
- with my modified tx_timeout handler, taking the interface down and
  up again after the timeout helps
- with stock tx_timeout handler, I have to unload and reload the
  module to fix up the card
- general pattern seems to be medium interrupt load -> instability
- ah yes, and this is a production machine at a slightly remote
  location. Silly me.

If you want me to test any patch, tell me. It can only get better.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to