Re: [E1000-devel] Detected Tx Unit Hang
On Wed, 11 Mar 2009, Gary W. Smith wrote: I asked this last week but didn't get a response. I have a supermicro apologies for the slow response. server with a dual intel nic that uses the e0100 driver. I'm using CentOS 5.2 and when I do anything network intensive I lose connectivity for a few seconds. Then we get this in the log. I downloaded, compiled and installed the latest e1000 driver. I see that the driver is in the proper location (based on timestamp). thank you for downloading the latest driver. It is probably 8.0.9? please load the driver with the module parameters TxDescriptorStep=4,4 you can modify /etc/modprobe.conf and add options e1000 TxDescriptorStep=4,4 (if you only have two ports) or just load the driver with modprobe e1000 TxDescriptorStep=4,4 and then use ethtool to increase the number of tx descriptors. ethtool -G eth0 tx 1024 this workaround only uses one in every four descriptors. How can I fix this problem on this server. I have tried to manually disable the tso and other entries but this doesn't seem to help. I've also tried setting it down to 100/full to no avail. It appears to be a TX, not RX issue. I say this because I run dstat in the background and when it hangs and then comes back it will quickly dump a full screen of dstat entries, which should be one per second, which I'm assuming that TCP is buffering the packets. please attach the full lspci -vvv for your system, make sure that you have the latest bios update, and that the system's bios settings are set to the defaults, and particularly any settings having to do with write combining or PCI transaction combining are disabled. Things I've tried. /sbin/ethtool -K eth0 tso off /sbin/ethtool -K eth0 rx off /sbin/ethtool -K eth0 tx off /sbin/ethtool -K eth0 sg off Mar 11 18:50:01 vcsoaknas01 kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Mar 11 18:50:01 vcsoaknas01 kernel: Tx Queue 0 Mar 11 18:50:01 vcsoaknas01 kernel: TDH f7 Mar 11 18:50:01 vcsoaknas01 kernel: TDT f7 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_use f7 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_clean24 Mar 11 18:50:01 vcsoaknas01 kernel: buffer_info[next_to_clean] Mar 11 18:50:01 vcsoaknas01 kernel: time_stamp 1004de0b1 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch24 Mar 11 18:50:01 vcsoaknas01 kernel: jiffies 1004dec18 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch.status 0 this really indicates that the adapter is finishing all the work but that the descriptor is not making it back to main memory indicating the work was completed. We have seen this a lot with AMD systems, in particular ones with VIA chipsets. There is a bad bug in those machines when an IO device and the processor both write to the same cache line. also, if the above workaround doesn't help we'll want you to install the dump patch from the patches section of e1000.sourceforge.net and send us the output when you get a tx hang. hope this helps, Jesse -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] Detected Tx Unit Hang
Excuse my ignorance, but which patches? ;). There's a lot of stuff on the download page. I assume you are talking about the I/OAT driver kernel patch but I want to make sure before doing it. Mar 11 18:50:01 vcsoaknas01 kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Mar 11 18:50:01 vcsoaknas01 kernel: Tx Queue 0 Mar 11 18:50:01 vcsoaknas01 kernel: TDH f7 Mar 11 18:50:01 vcsoaknas01 kernel: TDT f7 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_use f7 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_clean24 Mar 11 18:50:01 vcsoaknas01 kernel: buffer_info[next_to_clean] Mar 11 18:50:01 vcsoaknas01 kernel: time_stamp 1004de0b1 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch24 Mar 11 18:50:01 vcsoaknas01 kernel: jiffies 1004dec18 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch.status 0 this really indicates that the adapter is finishing all the work but that the descriptor is not making it back to main memory indicating the work was completed. We have seen this a lot with AMD systems, in particular ones with VIA chipsets. There is a bad bug in those machines when an IO device and the processor both write to the same cache line. also, if the above workaround doesn't help we'll want you to install the dump patch from the patches section of e1000.sourceforge.net and send us the output when you get a tx hang. hope this helps, Jesse -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] Detected Tx Unit Hang
Thanks. I'll get this in sometime this afternoon. Hopefully we can have some information from the server tonight. Gary Sent via BlackBerry by ATT -Original Message- From: Brandeburg, Jesse jesse.brandeb...@intel.com Date: Thu, 12 Mar 2009 09:33:21 To: Gary W. Smithg...@primeexalia.com Cc: e1000-devel@lists.sourceforge.nete1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang sorry, go to the home page http://sourceforge.net/projects/e1000 click Tracker click patches click tx hang debug code (all releases) - 1460945 download the e1000_806_dump.patch, it should apply with fuzz to your e1000 driver directory with the command download file.patch... patch -d e1000-8.0.* -p1 file.patch here is the download link https://sourceforge.net/tracker2/download.php?group_id=42302atid=447451file_id=298629aid=1460945 From: Gary W. Smith [mailto:g...@primeexalia.com] Sent: Thursday, March 12, 2009 9:16 AM To: Brandeburg, Jesse Cc: e1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang Excuse my ignorance, but which patches? ;). There's a lot of stuff on the download page. I assume you are talking about the I/OAT driver kernel patch but I want to make sure before doing it. Mar 11 18:50:01 vcsoaknas01 kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Mar 11 18:50:01 vcsoaknas01 kernel: Tx Queue 0 Mar 11 18:50:01 vcsoaknas01 kernel: TDH f7 Mar 11 18:50:01 vcsoaknas01 kernel: TDT f7 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_use f7 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_clean24 Mar 11 18:50:01 vcsoaknas01 kernel: buffer_info[next_to_clean] Mar 11 18:50:01 vcsoaknas01 kernel: time_stamp 1004de0b1 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch24 Mar 11 18:50:01 vcsoaknas01 kernel: jiffies 1004dec18 Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch.status 0 this really indicates that the adapter is finishing all the work but that the descriptor is not making it back to main memory indicating the work was completed. We have seen this a lot with AMD systems, in particular ones with VIA chipsets. There is a bad bug in those machines when an IO device and the processor both write to the same cache line. also, if the above workaround doesn't help we'll want you to install the dump patch from the patches section of e1000.sourceforge.net and send us the output when you get a tx hang. hope this helps, Jesse -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] [PATCH] igb: allow tx of pre-formatted vlan tagged packets
Hi Stephen, ... On Thu, Mar 12, 2009 at 01:51:19PM -0700, Stephen Hemminger wrote: On Thu, 12 Mar 2009 13:27:24 -0700 Arthur Jones ajo...@riverbed.com wrote: When the 82575 is fed 802.1q packets, it chokes with an error of the form: igb :08:00.1: partial checksum but proto=81 As the logic there was not smart enough to look into the vlan header to pick out the encapsulated protocol. There are times when we'd like to send these packets out without having to configure a vlan on the interface. Here we check for the vlan tag and allow the packet to go out with the correct hardware checksum. Thanks to Kand Ly k...@riverbed.com for discovering the issue and the coming up with a solution. This patch is based upon his work. Signed-off-by: Arthur Jones ajo...@riverbed.com --- drivers/net/igb/igb_main.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) That code in current igb driver (net-next-2.6) tree no longer has the whole switch you are changing. Ok, thanks, I'll test that one... Arthur -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] [PATCH] igb: allow tx of pre-formatted vlan tagged packets
On Thu, 12 Mar 2009 14:00:57 -0700 Arthur Jones ajo...@riverbed.com wrote: Hi Stephen, ... On Thu, Mar 12, 2009 at 01:51:19PM -0700, Stephen Hemminger wrote: On Thu, 12 Mar 2009 13:27:24 -0700 Arthur Jones ajo...@riverbed.com wrote: When the 82575 is fed 802.1q packets, it chokes with an error of the form: igb :08:00.1: partial checksum but proto=81 As the logic there was not smart enough to look into the vlan header to pick out the encapsulated protocol. There are times when we'd like to send these packets out without having to configure a vlan on the interface. Here we check for the vlan tag and allow the packet to go out with the correct hardware checksum. Thanks to Kand Ly k...@riverbed.com for discovering the issue and the coming up with a solution. This patch is based upon his work. Signed-off-by: Arthur Jones ajo...@riverbed.com --- drivers/net/igb/igb_main.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) That code in current igb driver (net-next-2.6) tree no longer has the whole switch you are changing. Ok, thanks, I'll test that one... Arthur Actually, it is there just a little furthur down than I was looking -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] [PATCH] igb: allow tx of pre-formatted vlan tagged packets
Hi Stephen, ... On Thu, Mar 12, 2009 at 01:51:19PM -0700, Stephen Hemminger wrote: On Thu, 12 Mar 2009 13:27:24 -0700 Arthur Jones ajo...@riverbed.com wrote: When the 82575 is fed 802.1q packets, it chokes with an error of the form: igb :08:00.1: partial checksum but proto=81 As the logic there was not smart enough to look into the vlan header to pick out the encapsulated protocol. There are times when we'd like to send these packets out without having to configure a vlan on the interface. Here we check for the vlan tag and allow the packet to go out with the correct hardware checksum. Thanks to Kand Ly k...@riverbed.com for discovering the issue and the coming up with a solution. This patch is based upon his work. Signed-off-by: Arthur Jones ajo...@riverbed.com --- drivers/net/igb/igb_main.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) That code in current igb driver (net-next-2.6) tree no longer has the whole switch you are changing. The patch from linux-2.6 applies to net-next-2.6 with just a minor context fixup. I think the issue still exists in net-next-2.6, I'm compiling and testing now... Arthur -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
[E1000-devel] [net-next PATCH] igb: allow tx of pre-formatted vlan tagged packets
When the 82575 is fed 802.1q packets, it chokes with an error of the form: igb :08:00.1: partial checksum but proto=81 As the logic there was not smart enough to look into the vlan header to pick out the encapsulated protocol. There are times when we'd like to send these packets out without having to configure a vlan on the interface. Here we check for the vlan tag and allow the packet to go out with the correct hardware checksum. Thanks to Kand Ly k...@riverbed.com for discovering the issue and the coming up with a solution. This patch is based upon his work. Macro fixups from Stephen Hemminger shemmin...@vyatta.com Signed-off-by: Arthur Jones ajo...@riverbed.com --- drivers/net/igb/igb_main.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 78558f8..e4ef1f6 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -3017,7 +3017,17 @@ static inline bool igb_tx_csum_adv(struct igb_adapter *adapter, tu_cmd |= (E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT); if (skb-ip_summed == CHECKSUM_PARTIAL) { - switch (skb-protocol) { + __be16 protocol; + + if (skb-protocol == cpu_to_be16(ETH_P_8021Q)) { + const struct vlan_ethhdr *vhdr = + (const struct vlan_ethhdr *) skb-data; + + protocol = vhdr-h_vlan_encapsulated_proto; + } else + protocol = skb-protocol; + + switch (protocol) { case cpu_to_be16(ETH_P_IP): tu_cmd |= E1000_ADVTXD_TUCMD_IPV4; if (ip_hdr(skb)-protocol == IPPROTO_TCP) -- 1.5.6.3 -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] Detected Tx Unit Hang
so, the 4GB patch will only cause a slight increase in cpu utilization. There are no other side effects, and you *DO NOT* have to run the TxDescriptorStep workaround. I think I might just push the change to not allow 64 bit addressing to these 32 bit adapters, into e1000. Glad to hear things are working better, Jesse From: Gary W. Smith [mailto:g...@primeexalia.com] Sent: Thursday, March 12, 2009 2:45 PM To: Gary W. Smith; Brandeburg, Jesse Cc: e1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang That was a bad example... I was copying to/from the same instance form a machine running under vmware. I now have a physical machine copying the 50gb of files from the bad machine to another machine and everything is still going smooth, but much faster this time. This is the dstat from the bad machine. The limiter is the disk (which is about 40mb/sec). We were hitting the error before with only 10mb. This this is definitely a positive thing. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 5 88 0 1 5| 25M0 | 443k 26M| 0 0 |5119 1560 0 8 85 0 2 6| 32M0 | 570k 33M| 0 0 |6311 1724 0 2 97 0 0 2|6662k0 | 135k 7626k| 0 0 |1992 355 0 0 100 0 0 0| 0 0 | 194B 240B| 0 0 |100716 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |102440 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |100616 0 0 100 0 0 0| 0 0 | 226B 240B| 0 0 |102440 0 0 100 0 0 0| 0 0 | 318B 240B| 0 0 |100816 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |102338 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |100618 0 2 97 0 1 2|8960k0 | 140k 8492k| 0 0 |2357 639 0 8 85 0 2 7| 33M0 | 599k 35M| 0 0 |5949 2036 0 10 78 0 2 10| 43M0 | 793k 46M| 0 0 |6993 2176 0 9 79 0 2 10| 42M0 | 751k 44M| 0 0 |6810 2213 0 9 82 0 2 8| 37M0 | 661k 39M| 0 0 |6998 1863 0 7 86 0 1 5| 28M0 | 521k 30M| 0 0 |5874 1933 From: Gary W. Smith [mailto:g...@primeexalia.com] Sent: Thu 3/12/2009 2:32 PM To: Brandeburg, Jesse Cc: e1000-devel@lists.sourceforge.net Subject: Re: [E1000-devel] Detected Tx Unit Hang Jesse, Looks better. transfering 50GB to/from the server and I'm not getting the errors in the log now. Very large pings (ping vcsoaknas01 -t -l 3 -w 7000) are occasionally timing out BUT I haven't lost connectivity to the SSH session as of yet and the file transfer is still going. dstat is also running consistantly (no random TX hangs like before). dstat: 0 3 92 0 1 4|4224k 13M|7338k 4642k| 0 0 | 10k 12k 0 1 98 0 0 1| 936k 3872k|1951k 1040k| 0 0 |3649 3403 0 1 96 0 0 2|1496k 8879k|4638k 1700k| 0 0 |7378 8853 0 4 91 0 1 4| 13M 3678k|2382k 14M| 0 0 |9188 7267 0 3 93 0 1 4|4352k 15M|7864k 4877k| 0 0 | 11k 13k 0 2 95 0 1 3| 384k 14M|7389k 516k| 0 0 |999012k total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 2 95 0 0 2|2816k 8104k|4327k 3098k| 0 0 |7075 7510 0 2 93 0 0 4|5696k 9120k|4918k 6176k| 0 0 |8478 8300 0 2 95 0 0 3|3968k 6720k|3610k 4306k| 0 0 |6425 6107 0 2 95 0 0 3|4736k 7616k|4081k 5135k| 0 0 |7242 6974 0 2 95 0 1 3|4224k 6816k|3687k 4582k| 0 0 |6589 6344 0 2 95 0 0 3|4096k 7016k|3748k 4445k| 0 0 |6546 6311 0 1 96 0 0 2|3136k 5288k|2852k 3402k| 0 0 |5251 4936 We have 50GB on an iscsi share (or 500GB) that we are copying to/from over the wire for this test. During the writing of this email we have already copied about 1.3gb without any problem as of yet. So my next question is regarding the 4GB patch. Does this have any negative impact that I need to be aware of? Gary From: Brandeburg, Jesse [mailto:jesse.brandeb...@intel.com] Sent: Thu 3/12/2009 1:59 PM To: Gary W. Smith Cc: e1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang re-added the list for tracking... I think I see the issue, you have more than 4GB ram, and it appears that your system doesn't handle dual address cycles correctly, or our adapter doesn't work quite right for some reason. Force the OS to never allow addresses 4GB to our hardware using this patch: https://sourceforge.net/tracker2/download.php?group_id=42302atid=447449file_id=283326aid=2007017 its the e1000_disable_dac.patch file.
Re: [E1000-devel] Detected Tx Unit Hang
Thid probably means the same b ug exists in windows as well. This is where we hit the problem first and converted it to a nas server. Gary Sent via BlackBerry by ATT -Original Message- From: Brandeburg, Jesse jesse.brandeb...@intel.com Date: Thu, 12 Mar 2009 15:01:27 To: Gary W. Smithg...@primeexalia.com Cc: e1000-devel@lists.sourceforge.nete1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang so, the 4GB patch will only cause a slight increase in cpu utilization. There are no other side effects, and you *DO NOT* have to run the TxDescriptorStep workaround. I think I might just push the change to not allow 64 bit addressing to these 32 bit adapters, into e1000. Glad to hear things are working better, Jesse From: Gary W. Smith [mailto:g...@primeexalia.com] Sent: Thursday, March 12, 2009 2:45 PM To: Gary W. Smith; Brandeburg, Jesse Cc: e1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang That was a bad example... I was copying to/from the same instance form a machine running under vmware. I now have a physical machine copying the 50gb of files from the bad machine to another machine and everything is still going smooth, but much faster this time. This is the dstat from the bad machine. The limiter is the disk (which is about 40mb/sec). We were hitting the error before with only 10mb. This this is definitely a positive thing. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 5 88 0 1 5| 25M0 | 443k 26M| 0 0 |5119 1560 0 8 85 0 2 6| 32M0 | 570k 33M| 0 0 |6311 1724 0 2 97 0 0 2|6662k0 | 135k 7626k| 0 0 |1992 355 0 0 100 0 0 0| 0 0 | 194B 240B| 0 0 |100716 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |102440 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |100616 0 0 100 0 0 0| 0 0 | 226B 240B| 0 0 |102440 0 0 100 0 0 0| 0 0 | 318B 240B| 0 0 |100816 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |102338 0 0 100 0 0 0| 0 0 | 134B 240B| 0 0 |100618 0 2 97 0 1 2|8960k0 | 140k 8492k| 0 0 |2357 639 0 8 85 0 2 7| 33M0 | 599k 35M| 0 0 |5949 2036 0 10 78 0 2 10| 43M0 | 793k 46M| 0 0 |6993 2176 0 9 79 0 2 10| 42M0 | 751k 44M| 0 0 |6810 2213 0 9 82 0 2 8| 37M0 | 661k 39M| 0 0 |6998 1863 0 7 86 0 1 5| 28M0 | 521k 30M| 0 0 |5874 1933 From: Gary W. Smith [mailto:g...@primeexalia.com] Sent: Thu 3/12/2009 2:32 PM To: Brandeburg, Jesse Cc: e1000-devel@lists.sourceforge.net Subject: Re: [E1000-devel] Detected Tx Unit Hang Jesse, Looks better. transfering 50GB to/from the server and I'm not getting the errors in the log now. Very large pings (ping vcsoaknas01 -t -l 3 -w 7000) are occasionally timing out BUT I haven't lost connectivity to the SSH session as of yet and the file transfer is still going. dstat is also running consistantly (no random TX hangs like before). dstat: 0 3 92 0 1 4|4224k 13M|7338k 4642k| 0 0 | 10k 12k 0 1 98 0 0 1| 936k 3872k|1951k 1040k| 0 0 |3649 3403 0 1 96 0 0 2|1496k 8879k|4638k 1700k| 0 0 |7378 8853 0 4 91 0 1 4| 13M 3678k|2382k 14M| 0 0 |9188 7267 0 3 93 0 1 4|4352k 15M|7864k 4877k| 0 0 | 11k 13k 0 2 95 0 1 3| 384k 14M|7389k 516k| 0 0 |999012k total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 2 95 0 0 2|2816k 8104k|4327k 3098k| 0 0 |7075 7510 0 2 93 0 0 4|5696k 9120k|4918k 6176k| 0 0 |8478 8300 0 2 95 0 0 3|3968k 6720k|3610k 4306k| 0 0 |6425 6107 0 2 95 0 0 3|4736k 7616k|4081k 5135k| 0 0 |7242 6974 0 2 95 0 1 3|4224k 6816k|3687k 4582k| 0 0 |6589 6344 0 2 95 0 0 3|4096k 7016k|3748k 4445k| 0 0 |6546 6311 0 1 96 0 0 2|3136k 5288k|2852k 3402k| 0 0 |5251 4936 We have 50GB on an iscsi share (or 500GB) that we are copying to/from over the wire for this test. During the writing of this email we have already copied about 1.3gb without any problem as of yet. So my next question is regarding the 4GB patch. Does this have any negative impact that I need to be aware of? Gary From: Brandeburg, Jesse [mailto:jesse.brandeb...@intel.com] Sent: Thu 3/12/2009 1:59 PM To: Gary W. Smith Cc: e1000-devel@lists.sourceforge.net Subject: RE: [E1000-devel] Detected Tx Unit Hang re-added the list for
Re: [E1000-devel] [net-next PATCH] igb: allow tx of pre-formatted vlan tagged packets
I have added a few comments inline. Arthur Jones wrote: When the 82575 is fed 802.1q packets, it chokes with an error of the form: igb :08:00.1: partial checksum but proto=81 As the logic there was not smart enough to look into the vlan header to pick out the encapsulated protocol. There are times when we'd like to send these packets out without having to configure a vlan on the interface. Here we check for the vlan tag and allow the packet to go out with the correct hardware checksum. Thanks to Kand Ly k...@riverbed.com for discovering the issue and the coming up with a solution. This patch is based upon his work. Macro fixups from Stephen Hemminger shemmin...@vyatta.com Signed-off-by: Arthur Jones ajo...@riverbed.com --- drivers/net/igb/igb_main.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 78558f8..e4ef1f6 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -3017,7 +3017,17 @@ static inline bool igb_tx_csum_adv(struct igb_adapter *adapter, tu_cmd |= (E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT); if (skb-ip_summed == CHECKSUM_PARTIAL) { - switch (skb-protocol) { + __be16 protocol; + + if (skb-protocol == cpu_to_be16(ETH_P_8021Q)) { + const struct vlan_ethhdr *vhdr = + (const struct vlan_ethhdr *) skb-data; This should probably reference skb_mac_header(skb) instead of data in the event that data is an offset instead of a pointer. + + protocol = vhdr-h_vlan_encapsulated_proto; + } else + protocol = skb-protocol; + This else should have braces since the matching if was using braces. + switch (protocol) { case cpu_to_be16(ETH_P_IP): tu_cmd |= E1000_ADVTXD_TUCMD_IPV4; if (ip_hdr(skb)-protocol == IPPROTO_TCP) Thanks, Alex -- Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel