Re: TCP SACK issue, hung connection, tcpdump included
Ilpo Järvinen wrote: On Tue, 31 Jul 2007, Darryl L. Miles wrote: I've been able to capture a tcpdump from both ends during the problem and its my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a SACK option for an old sequence which the current window being advertised is beyond it. This is the most concerning issue as the integrity of the sequence numbers doesn't seem right (to my limited understanding anyhow). You probably didn't check the reference I explicitly gave to those who are not familiar how DSACK works, just in case you didn't pick it up last time, here it is again for you: RFC2883... I've now squinted the D-SACK RFC and understand a little about this, however the RFC does make the claim "This extension is compatible with current implementations of the SACK option in TCP. That is, if one of the TCP end-nodes does not implement this D-SACK extension and the other TCP end-node does, we believe that this use of the D-SACK extension by one of the end nodes will not introduce problems." What if it turns out that is not true for a large enough number of SACK implementations out there; in the timeframe that SACK was supported but D-SACK was not supported. Would it be possible to clearly catagorise an implementation to be: * 100% SACK RFC compliant. SACK works and by virtue of the mandatory requirements written into the previous SACK RFCs then this implementation would never see a problem with receiving D-SACK even through the stack itself does not support D-SACK. * Mostly SACK RFC compliant. SACK works but if it saw D-SACK it would have a problems dealing with it, possibly resulting in fatal TCP lockups. Are there SACK implementation mandatory requirements in place for to be able to clearly draw the line and state that the 2.6.9 SACK implementation was not RFC compliant. * 100% SACK and D-DACK RFC compliant. Such an implementation was written to support D-SACK on top of SACK. So if there is a problem whos fault would it be: * The original SACK RFCs for not specifying a mandatory course of action to take which D-SACK exploits. Thus making the claim in RFC2883 unsound. * The older linux kernel for not being 100% SACK RFC compliant in its implementation ? Not a lot we can do about this now, but if we're able to identify there maybe backward compatibility issues with the same implementation thats a useful point to take forward. * The newer linux kernel for enabling D-SACK by default when RFC2883 doesn't even claim a cast iron case for D-SACK to be compatible with any 100% RFC compliant SACK implementation. Does TCP support the concept of vendor dependent options, that would be TCP options which are in a special range that would both identify the vendor and the vendors-specific option id. Such a system would allow Linux to implement a option, even if the RFC claims one is not needed. This would allow moving forward through this era until such point in time when it was officially agreed it was just a linux problem or an RFC problem. If its an RFC problem then IANA (or whoever) would issue a generic TCP option for it. If the dump on this problem really does identify a risk/problem when as its between 2 version of linux a vendor specific option also makes sense. I don't really want to switch new useful stuff off by default (so it never gets used), I'm all for experimentation but not to the point of failure between default configurations of widely distributed version of the kernel. So thats the technical approaches I can come up with to discuss. Does Ilpo have a particular vested interest in D-SACK that should be disclosed? However, if DSACKs really bother you still (though it shouldn't :-)), IIRC I also told you how you're able to turn it off (tcp_dsack sysctl) but I assure you that it's not a bug but feature called DSACK [RFC2883], there's _absolutely_ nothing wrong with it, instead, it would be wrong to _not_ send the below snd_una SACK in this scenario when tcp_dsack set to 1. So it is necessary to turn off a TCP option (that is enabled by default) to be sure to have reliable TCP connections (that don't lock up) in the bugfree Linux networking stack ? This is absurd. If such an option causes such a problem; then that option should not be enabled by default. If however the problem is because of a bug then let us continue to try to isolate the cause rather than wallpaper over the cracks with the voodoo of turning things that are enabled by default off. It only makes sense to turn options off when there is a 3rd party involved (or other means beyond your control) which is affecting function, the case here is that two Linux kernel stacks are affected and no 3rd party device has been shown to be affecting function. There is another concern of why the SERVER performed a retransmission in the first place, when the tcpdump shows the ack covering it has been seen. There are only three possible reasons to this thing:
Re: TCP SACK issue, hung connection, tcpdump included
Ilpo Järvinen wrote: On Tue, 31 Jul 2007, Darryl L. Miles wrote: I've been able to capture a tcpdump from both ends during the problem and its my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a SACK option for an old sequence which the current window being advertised is beyond it. This is the most concerning issue as the integrity of the sequence numbers doesn't seem right (to my limited understanding anyhow). You probably didn't check the reference I explicitly gave to those who are not familiar how DSACK works, just in case you didn't pick it up last time, here it is again for you: RFC2883... I've now squinted the D-SACK RFC and understand a little about this, however the RFC does make the claim This extension is compatible with current implementations of the SACK option in TCP. That is, if one of the TCP end-nodes does not implement this D-SACK extension and the other TCP end-node does, we believe that this use of the D-SACK extension by one of the end nodes will not introduce problems. What if it turns out that is not true for a large enough number of SACK implementations out there; in the timeframe that SACK was supported but D-SACK was not supported. Would it be possible to clearly catagorise an implementation to be: * 100% SACK RFC compliant. SACK works and by virtue of the mandatory requirements written into the previous SACK RFCs then this implementation would never see a problem with receiving D-SACK even through the stack itself does not support D-SACK. * Mostly SACK RFC compliant. SACK works but if it saw D-SACK it would have a problems dealing with it, possibly resulting in fatal TCP lockups. Are there SACK implementation mandatory requirements in place for to be able to clearly draw the line and state that the 2.6.9 SACK implementation was not RFC compliant. * 100% SACK and D-DACK RFC compliant. Such an implementation was written to support D-SACK on top of SACK. So if there is a problem whos fault would it be: * The original SACK RFCs for not specifying a mandatory course of action to take which D-SACK exploits. Thus making the claim in RFC2883 unsound. * The older linux kernel for not being 100% SACK RFC compliant in its implementation ? Not a lot we can do about this now, but if we're able to identify there maybe backward compatibility issues with the same implementation thats a useful point to take forward. * The newer linux kernel for enabling D-SACK by default when RFC2883 doesn't even claim a cast iron case for D-SACK to be compatible with any 100% RFC compliant SACK implementation. Does TCP support the concept of vendor dependent options, that would be TCP options which are in a special range that would both identify the vendor and the vendors-specific option id. Such a system would allow Linux to implement a D-SACK Ok option, even if the RFC claims one is not needed. This would allow moving forward through this era until such point in time when it was officially agreed it was just a linux problem or an RFC problem. If its an RFC problem then IANA (or whoever) would issue a generic TCP option for it. If the dump on this problem really does identify a risk/problem when as its between 2 version of linux a vendor specific option also makes sense. I don't really want to switch new useful stuff off by default (so it never gets used), I'm all for experimentation but not to the point of failure between default configurations of widely distributed version of the kernel. So thats the technical approaches I can come up with to discuss. Does Ilpo have a particular vested interest in D-SACK that should be disclosed? However, if DSACKs really bother you still (though it shouldn't :-)), IIRC I also told you how you're able to turn it off (tcp_dsack sysctl) but I assure you that it's not a bug but feature called DSACK [RFC2883], there's _absolutely_ nothing wrong with it, instead, it would be wrong to _not_ send the below snd_una SACK in this scenario when tcp_dsack set to 1. So it is necessary to turn off a TCP option (that is enabled by default) to be sure to have reliable TCP connections (that don't lock up) in the bugfree Linux networking stack ? This is absurd. If such an option causes such a problem; then that option should not be enabled by default. If however the problem is because of a bug then let us continue to try to isolate the cause rather than wallpaper over the cracks with the voodoo of turning things that are enabled by default off. It only makes sense to turn options off when there is a 3rd party involved (or other means beyond your control) which is affecting function, the case here is that two Linux kernel stacks are affected and no 3rd party device has been shown to be affecting function. There is another concern of why the SERVER performed a retransmission in the first place, when the tcpdump shows the ack covering it has been seen. There are only three possible reasons to this
TCP Connection lockup between 2.4.0 and 2.4.5
Hi, 10.0.0.218 = Linux 2.4.0 SMP (tcp_timestamps, tcp_window_scaling and tcp_sack all turned off, this doesn't appear to be relevant, since the problem is just the same when they are turned on). 10.0.0.219 = Linux 2.4.5 UP It appears the .218 end stops ACKing, even though it is obviously seeing the data come in, since the TCPDUMP is from the .218 host. I've been running 2.4.0 on 10.0.0.218 since 9th Jan and can't believe that this problem is a bug in 2.4.0, since it was speaking with the .219 box all this time until I recently updated the .219 end from 2.0.32 to 2.4.5 over last weekend. These are the only two linux boxes on the LAN, so this is the first time two 2.4.x boxes have been talking to each other at LAN speeds, both boxes have had full access to the Internet at dialup speeds all though this year (using a non NAT connection, the 10.x.x.x addrs aren't what they really operate as). I did get an inconsistant tcpdump when taken from the .219 end, in that the entries marked with '*' (which I have added a blank line around below) are actually reported like the following (note this is from a DIFFERENT session, NOT the same one as the larger dump below): 02:38:37.162128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514: 10.0.0.219.119 > 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792 (DF) 02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514: 10.0.0.219.119 > 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792 (DF) 02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1266: 10.0.0.219.119 > 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792 (DF) Notice the difference in ethernet frame length and TCP segment length. >From the other end everything looks normal, as in the 1st packet is "217:1665(1448)", 2nd packet is "1665:3113(1448)" and last is "3113:4313(1200)". It is like tcpdump got a hold of the last packet and repeated it 3 times, and somehow missed being able to sniff the first two. So what tcpdump reports from the .219 end isn't what .218 actually sees on the wire. This dump from the .218 end: 02:50:42.549261 10.0.0.218.2296 > 10.0.0.219.119: S 1468859836:1468859836(0) win 5840 (DF) 02:50:42.551250 10.0.0.219.119 > 10.0.0.218.2296: S 1410961156:1410961156(0) ack 1468859837 win 5840 (DF 02:50:42.551462 10.0.0.218.2296 > 10.0.0.219.119: . ack 1 win 5840 (DF) 02:50:42.741803 10.0.0.219.119 > 10.0.0.218.2296: P 1:108(107) ack 1 win 5840 (DF) 02:50:42.741925 10.0.0.218.2296 > 10.0.0.219.119: . ack 108 win 5840 (DF) 02:50:42.742347 10.0.0.218.2296 > 10.0.0.219.119: P 1:14(13) ack 108 win 5840 (DF) 02:50:42.744136 10.0.0.219.119 > 10.0.0.218.2296: . ack 14 win 5840 (DF) 02:50:42.761240 10.0.0.219.119 > 10.0.0.218.2296: P 108:117(9) ack 14 win 5840 (DF) 02:50:42.772263 10.0.0.218.2296 > 10.0.0.219.119: P 14:21(7) ack 117 win 5840 (DF) 02:50:42.77 10.0.0.219.119 > 10.0.0.218.2296: P 117:160(43) ack 21 win 5840 (DF) 02:50:42.784379 10.0.0.218.2296 > 10.0.0.219.119: P 21:40(19) ack 160 win 5840 (DF) 02:50:42.795936 10.0.0.219.119 > 10.0.0.218.2296: P 160:217(57) ack 40 win 5840 (DF) 02:50:42.799369 10.0.0.218.2296 > 10.0.0.219.119: P 40:46(6) ack 217 win 5840 (DF) 02:50:42.832749 10.0.0.219.119 > 10.0.0.218.2296: . ack 46 win 5840 (DF) * 02:50:42.846780 10.0.0.219.119 > 10.0.0.218.2296: . 217:1677(1460) ack 46 win 5840 (DF) 02:50:42.846975 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760 (DF) * 02:50:42.849085 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) * 02:50:42.851092 10.0.0.219.119 > 10.0.0.218.2296: P 3137:4313(1176) ack 46 win 5840 (DF) 02:50:42.851279 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760 (DF) 02:50:42.858301 10.0.0.219.119 > 10.0.0.218.2296: . 4313:5773(1460) ack 46 win 5840 (DF) 02:50:42.859679 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760 (DF) 02:50:42.860612 10.0.0.219.119 > 10.0.0.218.2296: . 5773:7233(1460) ack 46 win 5840 (DF) 02:50:42.867351 10.0.0.219.119 > 10.0.0.218.2296: . 7233:8693(1460) ack 46 win 5840 (DF) 02:50:42.867523 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760 (DF) 02:50:42.871097 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:43.074807 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:43.494738 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:44.334641 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:46.014434 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:49.374022 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
TCP Connection lockup between 2.4.0 and 2.4.5
Hi, 10.0.0.218 = Linux 2.4.0 SMP (tcp_timestamps, tcp_window_scaling and tcp_sack all turned off, this doesn't appear to be relevant, since the problem is just the same when they are turned on). 10.0.0.219 = Linux 2.4.5 UP It appears the .218 end stops ACKing, even though it is obviously seeing the data come in, since the TCPDUMP is from the .218 host. I've been running 2.4.0 on 10.0.0.218 since 9th Jan and can't believe that this problem is a bug in 2.4.0, since it was speaking with the .219 box all this time until I recently updated the .219 end from 2.0.32 to 2.4.5 over last weekend. These are the only two linux boxes on the LAN, so this is the first time two 2.4.x boxes have been talking to each other at LAN speeds, both boxes have had full access to the Internet at dialup speeds all though this year (using a non NAT connection, the 10.x.x.x addrs aren't what they really operate as). I did get an inconsistant tcpdump when taken from the .219 end, in that the entries marked with '*' (which I have added a blank line around below) are actually reported like the following (note this is from a DIFFERENT session, NOT the same one as the larger dump below): 02:38:37.162128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514: 10.0.0.219.119 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792 nop,nop,timestamp 1008848 4586969 (DF) 02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514: 10.0.0.219.119 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792 nop,nop,timestamp 1008848 4586969 (DF) 02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1266: 10.0.0.219.119 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792 nop,nop,timestamp 1008848 4586969 (DF) Notice the difference in ethernet frame length and TCP segment length. From the other end everything looks normal, as in the 1st packet is 217:1665(1448), 2nd packet is 1665:3113(1448) and last is 3113:4313(1200). It is like tcpdump got a hold of the last packet and repeated it 3 times, and somehow missed being able to sniff the first two. So what tcpdump reports from the .219 end isn't what .218 actually sees on the wire. This dump from the .218 end: 02:50:42.549261 10.0.0.218.2296 10.0.0.219.119: S 1468859836:1468859836(0) win 5840 mss 1460 (DF) 02:50:42.551250 10.0.0.219.119 10.0.0.218.2296: S 1410961156:1410961156(0) ack 1468859837 win 5840 mss 1460 (DF 02:50:42.551462 10.0.0.218.2296 10.0.0.219.119: . ack 1 win 5840 (DF) 02:50:42.741803 10.0.0.219.119 10.0.0.218.2296: P 1:108(107) ack 1 win 5840 (DF) 02:50:42.741925 10.0.0.218.2296 10.0.0.219.119: . ack 108 win 5840 (DF) 02:50:42.742347 10.0.0.218.2296 10.0.0.219.119: P 1:14(13) ack 108 win 5840 (DF) 02:50:42.744136 10.0.0.219.119 10.0.0.218.2296: . ack 14 win 5840 (DF) 02:50:42.761240 10.0.0.219.119 10.0.0.218.2296: P 108:117(9) ack 14 win 5840 (DF) 02:50:42.772263 10.0.0.218.2296 10.0.0.219.119: P 14:21(7) ack 117 win 5840 (DF) 02:50:42.77 10.0.0.219.119 10.0.0.218.2296: P 117:160(43) ack 21 win 5840 (DF) 02:50:42.784379 10.0.0.218.2296 10.0.0.219.119: P 21:40(19) ack 160 win 5840 (DF) 02:50:42.795936 10.0.0.219.119 10.0.0.218.2296: P 160:217(57) ack 40 win 5840 (DF) 02:50:42.799369 10.0.0.218.2296 10.0.0.219.119: P 40:46(6) ack 217 win 5840 (DF) 02:50:42.832749 10.0.0.219.119 10.0.0.218.2296: . ack 46 win 5840 (DF) * 02:50:42.846780 10.0.0.219.119 10.0.0.218.2296: . 217:1677(1460) ack 46 win 5840 (DF) 02:50:42.846975 10.0.0.218.2296 10.0.0.219.119: . ack 1677 win 8760 (DF) * 02:50:42.849085 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) * 02:50:42.851092 10.0.0.219.119 10.0.0.218.2296: P 3137:4313(1176) ack 46 win 5840 (DF) 02:50:42.851279 10.0.0.218.2296 10.0.0.219.119: . ack 1677 win 8760 (DF) 02:50:42.858301 10.0.0.219.119 10.0.0.218.2296: . 4313:5773(1460) ack 46 win 5840 (DF) 02:50:42.859679 10.0.0.218.2296 10.0.0.219.119: . ack 1677 win 8760 (DF) 02:50:42.860612 10.0.0.219.119 10.0.0.218.2296: . 5773:7233(1460) ack 46 win 5840 (DF) 02:50:42.867351 10.0.0.219.119 10.0.0.218.2296: . 7233:8693(1460) ack 46 win 5840 (DF) 02:50:42.867523 10.0.0.218.2296 10.0.0.219.119: . ack 1677 win 8760 (DF) 02:50:42.871097 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:43.074807 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:43.494738 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:44.334641 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:46.014434 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) 02:50:49.374022 10.0.0.219.119 10.0.0.218.2296: . 1677:3137(1460) ack 46 win 5840 (DF) -- Darryl Miles - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NETDEV WATCHDOG: eth0: transmit timed out
I am getting complete lockups of the NIC, up/down the interface doesn't restore it. rmmod/insmod of ne2k-pci and 8390 doesn't restore it. A reboot does. The m/c with this card in isn't normally highly loaded on the network, but under heavy load it will lockup completely (fairly reliably I suspect). I have also had this problem with 2.4.0-test11, I had traced it to ei_tx_intr() in so much as it was calling the "ei_local->stat.collisions += 16;" line. This is 8390.c:635 in 2.4.0. The log below shows the time I had reloaded the modules trying to bring it back to life. Jan 13 01:46:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:46:24 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=951. Jan 13 01:46:26 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:46:26 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=100. Jan 13 01:47:14 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:14 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=106. Jan 13 01:47:15 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:15 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=26. Jan 13 01:47:17 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:17 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=105. Jan 13 01:47:24 thehostname kernel: RPC: sendmsg returned error 101 Jan 13 01:47:24 thehostname kernel: nfs: RPC call returned error 101 Jan 13 01:47:24 thehostname kernel: nfs_statfs: statfs error = 101 Jan 13 01:47:37 thehostname kernel: ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker Jan 13 01:47:37 thehostname kernel: http://www.scyld.com/network/ne2k-pci.html Jan 13 01:47:37 thehostname kernel: eth0: RealTek RTL-8029 found at 0xe800, IRQ 19, 48:54:E8:21:15:56. Jan 13 01:47:47 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:47 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=111. Jan 13 01:47:58 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:58 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=1031. Jan 13 01:48:00 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:00 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=107. Jan 13 01:48:04 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:04 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=106. Jan 13 01:48:08 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:08 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=306. Jan 13 01:48:10 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:10 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=105. Jan 13 01:48:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:24 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x2, t=72. $ uname -r 2.4.0 lsmod bits: ne2k-pci4448 1 (autoclean) 83906544 0 (autoclean) [ne2k-pci] /proc/pci: Bus 0, device 11, function 0: Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) (rev 0). IRQ 19. I/O at 0xe800 [0xe81f]. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: NETDEV WATCHDOG: eth0: transmit timed out
I am getting complete lockups of the NIC, up/down the interface doesn't restore it. rmmod/insmod of ne2k-pci and 8390 doesn't restore it. A reboot does. The m/c with this card in isn't normally highly loaded on the network, but under heavy load it will lockup completely (fairly reliably I suspect). I have also had this problem with 2.4.0-test11, I had traced it to ei_tx_intr() in so much as it was calling the "ei_local-stat.collisions += 16;" line. This is 8390.c:635 in 2.4.0. The log below shows the time I had reloaded the modules trying to bring it back to life. Jan 13 01:46:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:46:24 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=951. Jan 13 01:46:26 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:46:26 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=100. Jan 13 01:47:14 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:14 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=106. Jan 13 01:47:15 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:15 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=26. Jan 13 01:47:17 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:17 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=105. Jan 13 01:47:24 thehostname kernel: RPC: sendmsg returned error 101 Jan 13 01:47:24 thehostname kernel: nfs: RPC call returned error 101 Jan 13 01:47:24 thehostname kernel: nfs_statfs: statfs error = 101 Jan 13 01:47:37 thehostname kernel: ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker Jan 13 01:47:37 thehostname kernel: http://www.scyld.com/network/ne2k-pci.html Jan 13 01:47:37 thehostname kernel: eth0: RealTek RTL-8029 found at 0xe800, IRQ 19, 48:54:E8:21:15:56. Jan 13 01:47:47 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:47 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=111. Jan 13 01:47:58 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:47:58 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=1031. Jan 13 01:48:00 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:00 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x1, ISR=0x3, t=107. Jan 13 01:48:04 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:04 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=106. Jan 13 01:48:08 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:08 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=306. Jan 13 01:48:10 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:10 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=105. Jan 13 01:48:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 13 01:48:24 thehostname kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x2, t=72. $ uname -r 2.4.0 lsmod bits: ne2k-pci4448 1 (autoclean) 83906544 0 (autoclean) [ne2k-pci] /proc/pci: Bus 0, device 11, function 0: Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) (rev 0). IRQ 19. I/O at 0xe800 [0xe81f]. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0: Small observation in /proc/sys/net/unix/
# ls -il /proc/sys/net/unix/ total 24 4446 -rw--- 1 root root0 Jan 11 11:06 max_dgram_qlen 4446 -rw--- 1 root root0 Jan 11 11:06 max_dgram_qlen Identical filenames, nothing bad appears to be happening it just looks weird. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0: Small observation in /proc/sys/net/unix/
# ls -il /proc/sys/net/unix/ total 24 4446 -rw--- 1 root root0 Jan 11 11:06 max_dgram_qlen 4446 -rw--- 1 root root0 Jan 11 11:06 max_dgram_qlen Identical filenames, nothing bad appears to be happening it just looks weird. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Abort x86 assemble code
hugang <[EMAIL PROTECTED]> wrote: > I have following code ,and I can not understand the mark line,who can tell >me.thanks. > 0ec7 xorl 0x400dec(,%eax,4),%ecx<-What it >to do. extern u_int32_t eax; extern u_int32_t ecx; { u_int32_t *ptr; ptr = (u_int32_t *)((eax * 4) + 0x400dec); ecx ^= *ptr; } Commonly used in the above form (with a fixed displacement) to access a 32bit value within an array of 32bit values. The array start offset would be hardwired at 0x400dec, the zero based index into the array is provided by eax. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Abort x86 assemble code
hugang [EMAIL PROTECTED] wrote: I have following code ,and I can not understand the mark line,who can tell me.thanks. 0ec7 xorl 0x400dec(,%eax,4),%ecx-What it to do. extern u_int32_t eax; extern u_int32_t ecx; { u_int32_t *ptr; ptr = (u_int32_t *)((eax * 4) + 0x400dec); ecx ^= *ptr; } Commonly used in the above form (with a fixed displacement) to access a 32bit value within an array of 32bit values. The array start offset would be hardwired at 0x400dec, the zero based index into the array is provided by eax. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] removal of "static foo = 0" from drivers/ide (test11)
Russell King <[EMAIL PROTECTED]> writes: >The only difference is the size on disk; if we go around setting every >bss variable to zero, the kernel/module data size will unnecessarily >huge. Hmm, what about common symbol generation? i.e. the linker looses the ability to throw out "multiply defined symbol" errors where you fail to initialise it to a value. Okay extern global variables in the kernel need to be controlled and it is not like many get added, however it is possible that one developer may never know it is already in use by another part of the kernel, when their oh-no-new driver is added. Since the linkers assistance in this issue has just been disabled. Is 'gas' able to be configured to never emit common symbols, but emit BBS symbols instead, or is 'ld' able to be configured to never merge common symbols but throw up "multiply defined symbol" errors. Then everyone is safe. >We already argue about the extra couple of bytes that xx change to the >kernel/a module would cost. With these change, we save kilo-bytes in >disk space (which is important on some systems). PDAs!!! :) Excellent work Russell. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] removal of static foo = 0 from drivers/ide (test11)
Russell King [EMAIL PROTECTED] writes: The only difference is the size on disk; if we go around setting every bss variable to zero, the kernel/module data size will unnecessarily huge. Hmm, what about common symbol generation? i.e. the linker looses the ability to throw out "multiply defined symbol" errors where you fail to initialise it to a value. Okay extern global variables in the kernel need to be controlled and it is not like many get added, however it is possible that one developer may never know it is already in use by another part of the kernel, when their oh-no-new driver is added. Since the linkers assistance in this issue has just been disabled. Is 'gas' able to be configured to never emit common symbols, but emit BBS symbols instead, or is 'ld' able to be configured to never merge common symbols but throw up "multiply defined symbol" errors. Then everyone is safe. We already argue about the extra couple of bytes that xx change to the kernel/a module would cost. With these change, we save kilo-bytes in disk space (which is important on some systems). PDAs!!! :) Excellent work Russell. -- Darryl Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/