Re: TCP SACK issue, hung connection, tcpdump included

2007-08-02 Thread Darryl Miles

Ilpo Järvinen wrote:

On Tue, 31 Jul 2007, Darryl L. Miles wrote:


I've been able to capture a tcpdump from both ends during the problem and its
my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a
SACK option for an old sequence which the current window being advertised is
beyond it.  This is the most concerning issue as the integrity of the sequence
numbers doesn't seem right (to my limited understanding anyhow).


You probably didn't check the reference I explicitly gave to those who 
are not familiar how DSACK works, just in case you didn't pick it up last 
time, here it is again for you: RFC2883... 


I've now squinted the D-SACK RFC and understand a little about this,
however the RFC does make the claim "This extension is compatible with
current implementations of the SACK option in TCP.  That is, if one of
the TCP end-nodes does not implement this D-SACK extension and the other
TCP end-node does, we believe that this use of the D-SACK extension by
one of the end nodes will not introduce problems."

What if it turns out that is not true for a large enough number of SACK
implementations out there; in the timeframe that SACK was supported but
D-SACK was not supported.  Would it be possible to clearly catagorise an
implementation to be:

 * 100% SACK RFC compliant.  SACK works and by virtue of the mandatory
requirements written into the previous SACK RFCs then this
implementation would never see a problem with receiving D-SACK even
through the stack itself does not support D-SACK.

 * Mostly SACK RFC compliant.  SACK works but if it saw D-SACK it would
have a problems dealing with it, possibly resulting in fatal TCP
lockups.  Are there SACK implementation mandatory requirements in place
for to be able to clearly draw the line and state that the 2.6.9 SACK
implementation was not RFC compliant.

 * 100% SACK and D-DACK RFC compliant.  Such an implementation was 
written to support D-SACK on top of SACK.




So if there is a problem whos fault would it be:

 * The original SACK RFCs for not specifying a mandatory course of
action to take which D-SACK exploits.  Thus making the claim in RFC2883 
unsound.


 * The older linux kernel for not being 100% SACK RFC compliant in its
implementation ?  Not a lot we can do about this now, but if we're able
to identify there maybe backward compatibility issues with the same
implementation thats a useful point to take forward.

 * The newer linux kernel for enabling D-SACK by default when RFC2883
doesn't even claim a cast iron case for D-SACK to be compatible with any
100% RFC compliant SACK implementation.


Does TCP support the concept of vendor dependent options, that would be
TCP options which are in a special range that would both identify the
vendor and the vendors-specific option id.  Such a system would allow
Linux to implement a  option, even if the RFC claims one is
not needed.  This would allow moving forward through this era until such
point in time when it was officially agreed it was just a linux problem 
or an RFC problem.  If its an RFC problem then IANA (or whoever) would 
issue a generic TCP option for it.


If the dump on this problem really does identify a risk/problem when as
its between 2 version of linux a vendor specific option also makes sense.

I don't really want to switch new useful stuff off by default (so it
never gets used), I'm all for experimentation but not to the point of
failure between default configurations of widely distributed version of 
the kernel.



So thats the technical approaches I can come up with to discuss.  Does
Ilpo have a particular vested interest in D-SACK that should be disclosed?



However, if DSACKs really
bother you still (though it shouldn't :-)), IIRC I also told you how
you're able to turn it off (tcp_dsack sysctl) but I assure you that it's
not a bug but feature called DSACK [RFC2883], there's _absolutely_ 

nothing
wrong with it, instead, it would be wrong to _not_ send the below 

snd_una

SACK in this scenario when tcp_dsack set to 1.


So it is necessary to turn off a TCP option (that is enabled by default)
to be sure to have reliable TCP connections (that don't lock up) in the
bugfree Linux networking stack ?  This is absurd.

If such an option causes such a problem; then that option should not be
enabled by default.  If however the problem is because of a bug then let
us continue to try to isolate the cause rather than wallpaper over the
cracks with the voodoo of turning things that are enabled by default off.

It only makes sense to turn options off when there is a 3rd party
involved (or other means beyond your control) which is affecting
function, the case here is that two Linux kernel stacks are affected and
no 3rd party device has been shown to be affecting function.



There is another concern of why the SERVER performed a retransmission in the
first place, when the tcpdump shows the ack covering it has been seen.


There are only three possible reasons to this thing:

Re: TCP SACK issue, hung connection, tcpdump included

2007-08-02 Thread Darryl Miles

Ilpo Järvinen wrote:

On Tue, 31 Jul 2007, Darryl L. Miles wrote:


I've been able to capture a tcpdump from both ends during the problem and its
my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a
SACK option for an old sequence which the current window being advertised is
beyond it.  This is the most concerning issue as the integrity of the sequence
numbers doesn't seem right (to my limited understanding anyhow).


You probably didn't check the reference I explicitly gave to those who 
are not familiar how DSACK works, just in case you didn't pick it up last 
time, here it is again for you: RFC2883... 


I've now squinted the D-SACK RFC and understand a little about this,
however the RFC does make the claim This extension is compatible with
current implementations of the SACK option in TCP.  That is, if one of
the TCP end-nodes does not implement this D-SACK extension and the other
TCP end-node does, we believe that this use of the D-SACK extension by
one of the end nodes will not introduce problems.

What if it turns out that is not true for a large enough number of SACK
implementations out there; in the timeframe that SACK was supported but
D-SACK was not supported.  Would it be possible to clearly catagorise an
implementation to be:

 * 100% SACK RFC compliant.  SACK works and by virtue of the mandatory
requirements written into the previous SACK RFCs then this
implementation would never see a problem with receiving D-SACK even
through the stack itself does not support D-SACK.

 * Mostly SACK RFC compliant.  SACK works but if it saw D-SACK it would
have a problems dealing with it, possibly resulting in fatal TCP
lockups.  Are there SACK implementation mandatory requirements in place
for to be able to clearly draw the line and state that the 2.6.9 SACK
implementation was not RFC compliant.

 * 100% SACK and D-DACK RFC compliant.  Such an implementation was 
written to support D-SACK on top of SACK.




So if there is a problem whos fault would it be:

 * The original SACK RFCs for not specifying a mandatory course of
action to take which D-SACK exploits.  Thus making the claim in RFC2883 
unsound.


 * The older linux kernel for not being 100% SACK RFC compliant in its
implementation ?  Not a lot we can do about this now, but if we're able
to identify there maybe backward compatibility issues with the same
implementation thats a useful point to take forward.

 * The newer linux kernel for enabling D-SACK by default when RFC2883
doesn't even claim a cast iron case for D-SACK to be compatible with any
100% RFC compliant SACK implementation.


Does TCP support the concept of vendor dependent options, that would be
TCP options which are in a special range that would both identify the
vendor and the vendors-specific option id.  Such a system would allow
Linux to implement a D-SACK Ok option, even if the RFC claims one is
not needed.  This would allow moving forward through this era until such
point in time when it was officially agreed it was just a linux problem 
or an RFC problem.  If its an RFC problem then IANA (or whoever) would 
issue a generic TCP option for it.


If the dump on this problem really does identify a risk/problem when as
its between 2 version of linux a vendor specific option also makes sense.

I don't really want to switch new useful stuff off by default (so it
never gets used), I'm all for experimentation but not to the point of
failure between default configurations of widely distributed version of 
the kernel.



So thats the technical approaches I can come up with to discuss.  Does
Ilpo have a particular vested interest in D-SACK that should be disclosed?



However, if DSACKs really
bother you still (though it shouldn't :-)), IIRC I also told you how
you're able to turn it off (tcp_dsack sysctl) but I assure you that it's
not a bug but feature called DSACK [RFC2883], there's _absolutely_ 

nothing
wrong with it, instead, it would be wrong to _not_ send the below 

snd_una

SACK in this scenario when tcp_dsack set to 1.


So it is necessary to turn off a TCP option (that is enabled by default)
to be sure to have reliable TCP connections (that don't lock up) in the
bugfree Linux networking stack ?  This is absurd.

If such an option causes such a problem; then that option should not be
enabled by default.  If however the problem is because of a bug then let
us continue to try to isolate the cause rather than wallpaper over the
cracks with the voodoo of turning things that are enabled by default off.

It only makes sense to turn options off when there is a 3rd party
involved (or other means beyond your control) which is affecting
function, the case here is that two Linux kernel stacks are affected and
no 3rd party device has been shown to be affecting function.



There is another concern of why the SERVER performed a retransmission in the
first place, when the tcpdump shows the ack covering it has been seen.


There are only three possible reasons to this 

TCP Connection lockup between 2.4.0 and 2.4.5

2001-06-04 Thread Darryl Miles


Hi,


10.0.0.218 = Linux 2.4.0 SMP (tcp_timestamps, tcp_window_scaling and
tcp_sack all turned off, this doesn't appear to be relevant, since the
problem is just the same when they are turned on).
10.0.0.219 = Linux 2.4.5 UP

It appears the .218 end stops ACKing, even though it is obviously seeing
the data come in, since the TCPDUMP is from the .218 host.  I've been
running 2.4.0 on 10.0.0.218 since 9th Jan and can't believe that this
problem is a bug in 2.4.0, since it was speaking with the .219 box all
this time until I recently updated the .219 end from 2.0.32 to 2.4.5
over last weekend.

These are the only two linux boxes on the LAN, so this is the first time
two 2.4.x boxes have been talking to each other at LAN speeds, both
boxes have had full access to the Internet at dialup speeds all though
this year (using a non NAT connection, the 10.x.x.x addrs aren't what
they really operate as).


I did get an inconsistant tcpdump when taken from the .219 end, in that
the entries marked with '*' (which I have added a blank line around
below) are actually reported like the following (note this is from a
DIFFERENT session, NOT the same one as the larger dump below):

02:38:37.162128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514:
10.0.0.219.119 > 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792
 (DF)
02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514:
10.0.0.219.119 > 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792
 (DF)
02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1266:
10.0.0.219.119 > 10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792
 (DF)

Notice the difference in ethernet frame length and TCP segment length. 
>From the other end everything looks normal, as in the 1st packet is
"217:1665(1448)", 2nd packet is "1665:3113(1448)" and last is
"3113:4313(1200)". It is like tcpdump got a hold of the last packet and
repeated it 3 times, and somehow missed being able to sniff the first
two.  So what tcpdump reports from the .219 end isn't what .218 actually
sees on the wire.



This dump from the .218 end:

02:50:42.549261 10.0.0.218.2296 > 10.0.0.219.119: S
1468859836:1468859836(0) win 5840  (DF)
02:50:42.551250 10.0.0.219.119 > 10.0.0.218.2296: S
1410961156:1410961156(0) ack 1468859837 win 5840  (DF
02:50:42.551462 10.0.0.218.2296 > 10.0.0.219.119: . ack 1 win 5840 (DF)
02:50:42.741803 10.0.0.219.119 > 10.0.0.218.2296: P 1:108(107) ack 1 win
5840 (DF)
02:50:42.741925 10.0.0.218.2296 > 10.0.0.219.119: . ack 108 win 5840
(DF)
02:50:42.742347 10.0.0.218.2296 > 10.0.0.219.119: P 1:14(13) ack 108 win
5840 (DF)
02:50:42.744136 10.0.0.219.119 > 10.0.0.218.2296: . ack 14 win 5840 (DF)
02:50:42.761240 10.0.0.219.119 > 10.0.0.218.2296: P 108:117(9) ack 14
win 5840 (DF)
02:50:42.772263 10.0.0.218.2296 > 10.0.0.219.119: P 14:21(7) ack 117 win
5840 (DF)
02:50:42.77 10.0.0.219.119 > 10.0.0.218.2296: P 117:160(43) ack 21
win 5840 (DF)
02:50:42.784379 10.0.0.218.2296 > 10.0.0.219.119: P 21:40(19) ack 160
win 5840 (DF)
02:50:42.795936 10.0.0.219.119 > 10.0.0.218.2296: P 160:217(57) ack 40
win 5840 (DF)
02:50:42.799369 10.0.0.218.2296 > 10.0.0.219.119: P 40:46(6) ack 217 win
5840 (DF)
02:50:42.832749 10.0.0.219.119 > 10.0.0.218.2296: . ack 46 win 5840 (DF)

* 02:50:42.846780 10.0.0.219.119 > 10.0.0.218.2296: . 217:1677(1460) ack
46 win 5840 (DF)

02:50:42.846975 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760
(DF)

* 02:50:42.849085 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460)
ack 46 win 5840 (DF)
* 02:50:42.851092 10.0.0.219.119 > 10.0.0.218.2296: P 3137:4313(1176)
ack 46 win 5840 (DF)

02:50:42.851279 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760
(DF)
02:50:42.858301 10.0.0.219.119 > 10.0.0.218.2296: . 4313:5773(1460) ack
46 win 5840 (DF)
02:50:42.859679 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760
(DF)
02:50:42.860612 10.0.0.219.119 > 10.0.0.218.2296: . 5773:7233(1460) ack
46 win 5840 (DF)
02:50:42.867351 10.0.0.219.119 > 10.0.0.218.2296: . 7233:8693(1460) ack
46 win 5840 (DF)
02:50:42.867523 10.0.0.218.2296 > 10.0.0.219.119: . ack 1677 win 8760
(DF)
02:50:42.871097 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:43.074807 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:43.494738 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:44.334641 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:46.014434 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:49.374022 10.0.0.219.119 > 10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)


-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



TCP Connection lockup between 2.4.0 and 2.4.5

2001-06-04 Thread Darryl Miles


Hi,


10.0.0.218 = Linux 2.4.0 SMP (tcp_timestamps, tcp_window_scaling and
tcp_sack all turned off, this doesn't appear to be relevant, since the
problem is just the same when they are turned on).
10.0.0.219 = Linux 2.4.5 UP

It appears the .218 end stops ACKing, even though it is obviously seeing
the data come in, since the TCPDUMP is from the .218 host.  I've been
running 2.4.0 on 10.0.0.218 since 9th Jan and can't believe that this
problem is a bug in 2.4.0, since it was speaking with the .219 box all
this time until I recently updated the .219 end from 2.0.32 to 2.4.5
over last weekend.

These are the only two linux boxes on the LAN, so this is the first time
two 2.4.x boxes have been talking to each other at LAN speeds, both
boxes have had full access to the Internet at dialup speeds all though
this year (using a non NAT connection, the 10.x.x.x addrs aren't what
they really operate as).


I did get an inconsistant tcpdump when taken from the .219 end, in that
the entries marked with '*' (which I have added a blank line around
below) are actually reported like the following (note this is from a
DIFFERENT session, NOT the same one as the larger dump below):

02:38:37.162128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514:
10.0.0.219.119  10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792
nop,nop,timestamp 1008848 4586969 (DF)
02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1514:
10.0.0.219.119  10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792
nop,nop,timestamp 1008848 4586969 (DF)
02:38:37.172128 0:20:af:52:3d:17 0:50:da:8a:4c:80 0800 1266:
10.0.0.219.119  10.0.0.218.2226: P 3113:4313(1200) ack 46 win 5792
nop,nop,timestamp 1008848 4586969 (DF)

Notice the difference in ethernet frame length and TCP segment length. 
From the other end everything looks normal, as in the 1st packet is
217:1665(1448), 2nd packet is 1665:3113(1448) and last is
3113:4313(1200). It is like tcpdump got a hold of the last packet and
repeated it 3 times, and somehow missed being able to sniff the first
two.  So what tcpdump reports from the .219 end isn't what .218 actually
sees on the wire.



This dump from the .218 end:

02:50:42.549261 10.0.0.218.2296  10.0.0.219.119: S
1468859836:1468859836(0) win 5840 mss 1460 (DF)
02:50:42.551250 10.0.0.219.119  10.0.0.218.2296: S
1410961156:1410961156(0) ack 1468859837 win 5840 mss 1460 (DF
02:50:42.551462 10.0.0.218.2296  10.0.0.219.119: . ack 1 win 5840 (DF)
02:50:42.741803 10.0.0.219.119  10.0.0.218.2296: P 1:108(107) ack 1 win
5840 (DF)
02:50:42.741925 10.0.0.218.2296  10.0.0.219.119: . ack 108 win 5840
(DF)
02:50:42.742347 10.0.0.218.2296  10.0.0.219.119: P 1:14(13) ack 108 win
5840 (DF)
02:50:42.744136 10.0.0.219.119  10.0.0.218.2296: . ack 14 win 5840 (DF)
02:50:42.761240 10.0.0.219.119  10.0.0.218.2296: P 108:117(9) ack 14
win 5840 (DF)
02:50:42.772263 10.0.0.218.2296  10.0.0.219.119: P 14:21(7) ack 117 win
5840 (DF)
02:50:42.77 10.0.0.219.119  10.0.0.218.2296: P 117:160(43) ack 21
win 5840 (DF)
02:50:42.784379 10.0.0.218.2296  10.0.0.219.119: P 21:40(19) ack 160
win 5840 (DF)
02:50:42.795936 10.0.0.219.119  10.0.0.218.2296: P 160:217(57) ack 40
win 5840 (DF)
02:50:42.799369 10.0.0.218.2296  10.0.0.219.119: P 40:46(6) ack 217 win
5840 (DF)
02:50:42.832749 10.0.0.219.119  10.0.0.218.2296: . ack 46 win 5840 (DF)

* 02:50:42.846780 10.0.0.219.119  10.0.0.218.2296: . 217:1677(1460) ack
46 win 5840 (DF)

02:50:42.846975 10.0.0.218.2296  10.0.0.219.119: . ack 1677 win 8760
(DF)

* 02:50:42.849085 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460)
ack 46 win 5840 (DF)
* 02:50:42.851092 10.0.0.219.119  10.0.0.218.2296: P 3137:4313(1176)
ack 46 win 5840 (DF)

02:50:42.851279 10.0.0.218.2296  10.0.0.219.119: . ack 1677 win 8760
(DF)
02:50:42.858301 10.0.0.219.119  10.0.0.218.2296: . 4313:5773(1460) ack
46 win 5840 (DF)
02:50:42.859679 10.0.0.218.2296  10.0.0.219.119: . ack 1677 win 8760
(DF)
02:50:42.860612 10.0.0.219.119  10.0.0.218.2296: . 5773:7233(1460) ack
46 win 5840 (DF)
02:50:42.867351 10.0.0.219.119  10.0.0.218.2296: . 7233:8693(1460) ack
46 win 5840 (DF)
02:50:42.867523 10.0.0.218.2296  10.0.0.219.119: . ack 1677 win 8760
(DF)
02:50:42.871097 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:43.074807 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:43.494738 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:44.334641 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:46.014434 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)
02:50:49.374022 10.0.0.219.119  10.0.0.218.2296: . 1677:3137(1460) ack
46 win 5840 (DF)


-- 
Darryl Miles
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: NETDEV WATCHDOG: eth0: transmit timed out

2001-01-12 Thread Darryl Miles


I am getting complete lockups of the NIC, up/down the interface doesn't
restore it.  rmmod/insmod of ne2k-pci and 8390 doesn't restore it.  A
reboot does.

The m/c with this card in isn't normally highly loaded on the network,
but under heavy load it will lockup completely (fairly reliably I
suspect).  I have also had this problem with 2.4.0-test11, I had traced
it to ei_tx_intr() in so much as it was calling the
"ei_local->stat.collisions += 16;" line.  This is 8390.c:635 in 2.4.0.


The log below shows the time I had reloaded the modules trying to bring
it back to life.

Jan 13 01:46:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:46:24 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=951.
Jan 13 01:46:26 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:46:26 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=100.
Jan 13 01:47:14 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:14 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=106.
Jan 13 01:47:15 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:15 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=26.
Jan 13 01:47:17 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:17 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=105.
Jan 13 01:47:24 thehostname kernel: RPC: sendmsg returned error 101
Jan 13 01:47:24 thehostname kernel: nfs: RPC call returned error 101
Jan 13 01:47:24 thehostname kernel: nfs_statfs: statfs error = 101
Jan 13 01:47:37 thehostname kernel: ne2k-pci.c:v1.02 10/19/2000 D.
Becker/P. Gortmaker
Jan 13 01:47:37 thehostname kernel:  
http://www.scyld.com/network/ne2k-pci.html
Jan 13 01:47:37 thehostname kernel: eth0: RealTek RTL-8029 found at
0xe800, IRQ 19, 48:54:E8:21:15:56.
Jan 13 01:47:47 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:47 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=111.
Jan 13 01:47:58 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:58 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=1031.
Jan 13 01:48:00 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:00 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x1, ISR=0x3, t=107.
Jan 13 01:48:04 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:04 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=106.
Jan 13 01:48:08 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:08 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=306.
Jan 13 01:48:10 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:10 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=105.
Jan 13 01:48:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:24 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x2, t=72.

$ uname -r
2.4.0


lsmod bits:

ne2k-pci4448   1  (autoclean)
83906544   0  (autoclean) [ne2k-pci]


/proc/pci:
  Bus  0, device  11, function  0:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
(rev 0).
  IRQ 19.
  I/O at 0xe800 [0xe81f].

-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NETDEV WATCHDOG: eth0: transmit timed out

2001-01-12 Thread Darryl Miles


I am getting complete lockups of the NIC, up/down the interface doesn't
restore it.  rmmod/insmod of ne2k-pci and 8390 doesn't restore it.  A
reboot does.

The m/c with this card in isn't normally highly loaded on the network,
but under heavy load it will lockup completely (fairly reliably I
suspect).  I have also had this problem with 2.4.0-test11, I had traced
it to ei_tx_intr() in so much as it was calling the
"ei_local-stat.collisions += 16;" line.  This is 8390.c:635 in 2.4.0.


The log below shows the time I had reloaded the modules trying to bring
it back to life.

Jan 13 01:46:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:46:24 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=951.
Jan 13 01:46:26 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:46:26 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=100.
Jan 13 01:47:14 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:14 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=106.
Jan 13 01:47:15 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:15 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=26.
Jan 13 01:47:17 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:17 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=105.
Jan 13 01:47:24 thehostname kernel: RPC: sendmsg returned error 101
Jan 13 01:47:24 thehostname kernel: nfs: RPC call returned error 101
Jan 13 01:47:24 thehostname kernel: nfs_statfs: statfs error = 101
Jan 13 01:47:37 thehostname kernel: ne2k-pci.c:v1.02 10/19/2000 D.
Becker/P. Gortmaker
Jan 13 01:47:37 thehostname kernel:  
http://www.scyld.com/network/ne2k-pci.html
Jan 13 01:47:37 thehostname kernel: eth0: RealTek RTL-8029 found at
0xe800, IRQ 19, 48:54:E8:21:15:56.
Jan 13 01:47:47 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:47 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=111.
Jan 13 01:47:58 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:47:58 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=1031.
Jan 13 01:48:00 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:00 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x1, ISR=0x3, t=107.
Jan 13 01:48:04 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:04 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=106.
Jan 13 01:48:08 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:08 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=306.
Jan 13 01:48:10 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:10 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x3, t=105.
Jan 13 01:48:24 thehostname kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Jan 13 01:48:24 thehostname kernel: eth0: Tx timed out, lost interrupt?
TSR=0x3, ISR=0x2, t=72.

$ uname -r
2.4.0


lsmod bits:

ne2k-pci4448   1  (autoclean)
83906544   0  (autoclean) [ne2k-pci]


/proc/pci:
  Bus  0, device  11, function  0:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
(rev 0).
  IRQ 19.
  I/O at 0xe800 [0xe81f].

-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0: Small observation in /proc/sys/net/unix/

2001-01-11 Thread Darryl Miles


# ls -il /proc/sys/net/unix/
total 24
   4446 -rw---   1 root root0 Jan 11 11:06
max_dgram_qlen
   4446 -rw---   1 root root0 Jan 11 11:06
max_dgram_qlen

Identical filenames, nothing bad appears to be happening it just looks
weird.

-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0: Small observation in /proc/sys/net/unix/

2001-01-11 Thread Darryl Miles


# ls -il /proc/sys/net/unix/
total 24
   4446 -rw---   1 root root0 Jan 11 11:06
max_dgram_qlen
   4446 -rw---   1 root root0 Jan 11 11:06
max_dgram_qlen

Identical filenames, nothing bad appears to be happening it just looks
weird.

-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Abort x86 assemble code

2001-01-03 Thread Darryl Miles


hugang <[EMAIL PROTECTED]> wrote:
>   I have following code ,and I can not understand the mark line,who can tell 
>me.thanks.
> 0ec7 xorl   0x400dec(,%eax,4),%ecx<-What it 
>to do.

extern u_int32_t eax;
extern u_int32_t ecx;

{
u_int32_t *ptr;

ptr = (u_int32_t *)((eax * 4) + 0x400dec);

ecx ^= *ptr;
}


Commonly used in the above form (with a fixed displacement) to access a
32bit value within an array of 32bit values.

The array start offset would be hardwired at 0x400dec, the zero based
index into the array is provided by eax.


-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Abort x86 assemble code

2001-01-03 Thread Darryl Miles


hugang [EMAIL PROTECTED] wrote:
   I have following code ,and I can not understand the mark line,who can tell 
me.thanks.
 0ec7 xorl   0x400dec(,%eax,4),%ecx-What it 
to do.

extern u_int32_t eax;
extern u_int32_t ecx;

{
u_int32_t *ptr;

ptr = (u_int32_t *)((eax * 4) + 0x400dec);

ecx ^= *ptr;
}


Commonly used in the above form (with a fixed displacement) to access a
32bit value within an array of 32bit values.

The array start offset would be hardwired at 0x400dec, the zero based
index into the array is provided by eax.


-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] removal of "static foo = 0" from drivers/ide (test11)

2000-11-29 Thread Darryl Miles


Russell King <[EMAIL PROTECTED]> writes:
>The only difference is the size on disk; if we go around setting every
>bss variable to zero, the kernel/module data size will unnecessarily
>huge.

Hmm, what about common symbol generation?  i.e. the linker looses the
ability
to throw out "multiply defined symbol" errors where you fail to
initialise it
to a value.

Okay extern global variables in the kernel need to be controlled and it
is not
like many get added, however it is possible that one developer may never
know
it is already in use by another part of the kernel, when their oh-no-new
driver
is added.  Since the linkers assistance in this issue has just been
disabled.

Is 'gas' able to be configured to never emit common symbols, but emit
BBS
symbols instead, or is 'ld' able to be configured to never merge common
symbols but throw up "multiply defined symbol" errors.  Then everyone is
safe.


>We already argue about the extra couple of bytes that xx change to the
>kernel/a module would cost.  With these change, we save kilo-bytes in
>disk space (which is important on some systems).
 
PDAs!!! :)  Excellent work Russell.

-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] removal of static foo = 0 from drivers/ide (test11)

2000-11-29 Thread Darryl Miles


Russell King [EMAIL PROTECTED] writes:
The only difference is the size on disk; if we go around setting every
bss variable to zero, the kernel/module data size will unnecessarily
huge.

Hmm, what about common symbol generation?  i.e. the linker looses the
ability
to throw out "multiply defined symbol" errors where you fail to
initialise it
to a value.

Okay extern global variables in the kernel need to be controlled and it
is not
like many get added, however it is possible that one developer may never
know
it is already in use by another part of the kernel, when their oh-no-new
driver
is added.  Since the linkers assistance in this issue has just been
disabled.

Is 'gas' able to be configured to never emit common symbols, but emit
BBS
symbols instead, or is 'ld' able to be configured to never merge common
symbols but throw up "multiply defined symbol" errors.  Then everyone is
safe.


We already argue about the extra couple of bytes that xx change to the
kernel/a module would cost.  With these change, we save kilo-bytes in
disk space (which is important on some systems).
 
PDAs!!! :)  Excellent work Russell.

-- 
Darryl Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/