e1000 TX unit hang (redux)
Hello All, I have an e1000 card periodically misbehaving with the message 'Detected Tx unit hang'. I've noticed this problem come up on netdev a couple of times and found the link to the bug tracking page-- http://sourceforge.net/tracker/index.php?func=detailaid=1463045group_id=42302atid=447449 I've also seen the patch that I believe was placed in 2.6.16 and subsequently brought down to 2.4.2? that seems to address this problem by creating a tx_timeout_factor relative to the speed of the NIC. However, there is no mention of this workaround/fix on the bug at the link above and I haven't found any discussion of it here on netdev. Auke recommends turning off tso to see if that resolves the problem and this also seems to work, though I have as yet not been able to confirm this and would prefer a more performance friendly fix..if available ;) Would one of you pplease give an update on the status of the bug? If a cause was ever found and if the tx_timeout_factor was intended as a fix or temporary workaround? I feel like I must have missed something, because I never saw the tx_timeout_factor patch go through netdev at all.. Thanks again for your help, Shaw - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 TX unit hang (redux)
[EMAIL PROTECTED] wrote: I have an e1000 card periodically misbehaving with the message 'Detected Tx unit hang'. I've noticed this problem come up on netdev a couple of times and found the link to the bug tracking page-- http://sourceforge.net/tracker/index.php?func=detailaid=1463045group_id=42302atid=447449 I've also seen the patch that I believe was placed in 2.6.16 and subsequently brought down to 2.4.2? that's not only impossible but also unlikely - we don't push changes to 2.4 kernels anymore a lot, I think the last change is likely older than 2.4.28 or so. that seems to address this problem by creating a tx_timeout_factor relative to the speed of the NIC. However, there is no mention of this workaround/fix on the bug at the link above and I haven't found any discussion of it here on netdev. I wouldn't even know what patch you are talking about (?!) Auke recommends turning off tso to see if that resolves the problem and this also seems to work, though I have as yet not been able to confirm this and would prefer a more performance friendly fix..if available ;) Would one of you pplease give an update on the status of the bug? If a cause was ever found and if the tx_timeout_factor was intended as a fix or temporary workaround? I feel like I must have missed something, because I never saw the tx_timeout_factor patch go through netdev at all.. One possible problem is a bad EEPROM bit, where the hardware might have been misconfigured. This only affects _some_ older e1000's. Any bugreport therefore should include the output of `ethtool -e ethX` (as well as the `lspci -vv` output of course. If you haven't already done so, please submit this to the bugtracker or to us by e-mail Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 TX unit hang (redux)
Hi Auke, On Tuesday 11 July 2006 14:09, Auke Kok wrote: that seems to address this problem by creating a tx_timeout_factor relative to the speed of the NIC. However, there is no mention of this workaround/fix on the bug at the link above and I haven't found any discussion of it here on netdev. I wouldn't even know what patch you are talking about (?!) Ok, well, the patch is in 2.6.17.4 and looks to have been announced in the 2.6.16-c2 changelog -- http://lwn.net/Articles/170529/ -- and written by Jeff Kirsher. I haven't been able to find a link to the original patch submission anywhere. The code looks something like this now: /* Detect a transmit hang in hardware, this serializes the * check with the clearing of time_stamp and movement of i */ adapter-detect_tx_hung = FALSE; if (tx_ring-buffer_info[eop].dma time_after(jiffies, tx_ring-buffer_info[eop].time_stamp + (adapter-tx_timeout_factor * HZ)) !(E1000_READ_REG(adapter-hw, STATUS) E1000_STATUS_TXOFF)) { ..where the tx_timeout_factor has been added and is set in the watchdog code based on the link speed. that's not only impossible but also unlikely - we don't push changes to 2.4 kernels anymore a lot, I think the last change is likely older than 2.4.28. I'm sure you're right. Jumped to conclusions on a patch I saw posted at redhat.. I'll be more careful next time :) I'll also try to get some better debugging info from my side. Thanks. Shaw - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
e1000 TX unit hang
I saw this error (once) in 2.6.13 a few weeks ago: Jun 23 15:19:01 X kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Jun 23 15:19:01 X kernel: TDH 7e Jun 23 15:19:01 X kernel: TDT 7f Jun 23 15:19:01 X kernel: next_to_use 7f Jun 23 15:19:01 X kernel: next_to_clean7e Jun 23 15:19:01 X kernel: buffer_info[next_to_clean] Jun 23 15:19:01 X kernel: dma 16ef9012 Jun 23 15:19:01 X kernel: time_stamp 423845db Jun 23 15:19:01 X kernel: next_to_watch7e Jun 23 15:19:01 X kernel: jiffies 423845db Jun 23 15:19:01 X kernel: next_to_watch.status 0 so upgraded to 2.6.17 and got a slew of them today - shown below. E1000 maintainers: any ideas? Phil Jul 5 11:43:26 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:26 X kernel: Tx Queue 0 Jul 5 11:43:26 X kernel: TDH a Jul 5 11:43:26 X kernel: TDT a Jul 5 11:43:26 X kernel: next_to_use a Jul 5 11:43:26 X kernel: next_to_clean5f Jul 5 11:43:26 X kernel: buffer_info[next_to_clean] Jul 5 11:43:26 X kernel: time_stamp b6bc51 Jul 5 11:43:26 X kernel: next_to_watch5f Jul 5 11:43:26 X kernel: jiffies b6bcc6 Jul 5 11:43:26 X kernel: next_to_watch.status 1 Jul 5 11:43:33 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:34 X kernel: Tx Queue 0 Jul 5 11:43:36 X kernel: TDH 2c Jul 5 11:43:38 X kernel: TDT 2c Jul 5 11:43:42 X kernel: next_to_use 2c Jul 5 11:43:45 X kernel: next_to_clean81 Jul 5 11:43:46 X kernel: buffer_info[next_to_clean] Jul 5 11:43:47 X kernel: time_stamp b6be88 Jul 5 11:43:49 X kernel: next_to_watch81 Jul 5 11:43:52 X kernel: jiffies b6bf0e Jul 5 11:43:53 X kernel: next_to_watch.status 1 Jul 5 11:43:53 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:53 X kernel: Tx Queue 0 Jul 5 11:43:53 X kernel: TDH ff Jul 5 11:43:53 X kernel: TDT ff Jul 5 11:43:53 X kernel: next_to_use ff Jul 5 11:43:53 X kernel: next_to_clean54 Jul 5 11:43:53 X kernel: buffer_info[next_to_clean] Jul 5 11:43:53 X kernel: time_stamp b6c06d Jul 5 11:43:53 X kernel: next_to_watch54 Jul 5 11:43:53 X kernel: jiffies b6c0d2 Jul 5 11:43:53 X kernel: next_to_watch.status 1 Jul 5 11:43:53 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:53 X kernel: Tx Queue 0 Jul 5 11:43:53 X kernel: TDH 81 Jul 5 11:43:53 X kernel: TDT 81 Jul 5 11:43:53 X kernel: next_to_use 81 Jul 5 11:43:53 X kernel: next_to_cleand6 Jul 5 11:43:53 X kernel: buffer_info[next_to_clean] Jul 5 11:43:53 X kernel: time_stamp b6c0b8 Jul 5 11:43:53 X kernel: next_to_watchd6 Jul 5 11:43:53 X kernel: jiffies b6c19b Jul 5 11:43:53 X kernel: next_to_watch.status 1 Jul 5 11:43:53 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:53 X kernel: Tx Queue 0 Jul 5 11:43:53 X kernel: TDH 1b Jul 5 11:43:53 X kernel: TDT 1b Jul 5 11:43:53 X kernel: next_to_use 1b Jul 5 11:43:53 X kernel: next_to_clean71 Jul 5 11:43:53 X kernel: buffer_info[next_to_clean] Jul 5 11:43:53 X kernel: time_stamp b6c1d8 Jul 5 11:43:53 X kernel: next_to_watch71 Jul 5 11:43:53 X kernel: jiffies b6c255 Jul 5 11:43:53 X kernel: next_to_watch.status 1 Jul 5 11:43:53 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:53 X kernel: Tx Queue 0 Jul 5 11:43:53 X kernel: TDH 9e Jul 5 11:43:53 X kernel: TDT 9e Jul 5 11:43:53 X kernel: next_to_use 9e Jul 5 11:43:54 X kernel: next_to_cleanf3 Jul 5 11:43:54 X kernel: buffer_info[next_to_clean] Jul 5 11:43:54 X kernel: time_stamp b6c229 Jul 5 11:43:54 X kernel: next_to_watchf3 Jul 5 11:43:54 X kernel: jiffies b6c329 Jul 5 11:43:54 X kernel: next_to_watch.status 1 Jul 5 11:43:54 X kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 5 11:43:54 X kernel: Tx Queue 0 Jul 5 11:43:54 X kernel: TDH 39 Jul 5 11:43:54 X kernel: TDT 39 Jul 5 11:43:54 X kernel: next_to_use 39 Jul 5 11:43:54 X kernel: next_to_clean8e Jul 5 11:43:54 X kernel: buffer_info[next_to_clean] Jul 5 11:43:54 X kernel: time_stamp b6c4a0 Jul 5 11:43:54 X kernel: next_to_watch8e Jul 5 11:43:54 X kernel: jiffies b6c558 Jul 5
Re: e1000 TX unit hang
Phil Oester wrote: I saw this error (once) in 2.6.13 a few weeks ago: Jun 23 15:19:01 X kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Jun 23 15:19:01 X kernel: TDH 7e Jun 23 15:19:01 X kernel: TDT 7f Jun 23 15:19:01 X kernel: next_to_use 7f Jun 23 15:19:01 X kernel: next_to_clean7e Jun 23 15:19:01 X kernel: buffer_info[next_to_clean] Jun 23 15:19:01 X kernel: dma 16ef9012 Jun 23 15:19:01 X kernel: time_stamp 423845db Jun 23 15:19:01 X kernel: next_to_watch7e Jun 23 15:19:01 X kernel: jiffies 423845db Jun 23 15:19:01 X kernel: next_to_watch.status 0 so upgraded to 2.6.17 and got a slew of them today - shown below. E1000 maintainers: any ideas? The issue is known and worked on, unfortunately no more information yet. We're tracking the issue and stuff (debug patches, etc) over here (at e1000.sf.net): http://sourceforge.net/tracker/index.php?func=detailaid=1463045group_id=42302atid=447449 For now, try to see if turning off tso using ethtool helps. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html