subject:"Re\: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync"

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-27 Thread kuznet


Hello!

> Why is it a bug to accept the ACK from it?  RFC793 page 69 says 
> 
> If the RCV.WND is zero, no segments will be acceptable, but
> special allowance should be made to accept valid ACKs, URGs and
> RSTs.

8) This obscure place is discussed for ages. The question is:
What is "valid"? Solaris folks apparently read that valid
are "all".

BSD interprets valid as "segment fits to window after truncation".


> Why shouldn't this be considered a valid ACK?

It may be considered as a valid ACK, provided all the pieces of TCP
do window updates right. If window update algorithm were sane,
it would not be a big problem from tcp viewpoint
(though it remains security hole)

Actually, the same effect (pathological window expansion)
happens in other cases. See tcp-impl, Subj: "Send window update algorithm ...".


> can point me to it.  Why doesn't the probe use the correct sequence number
> instead of backing up one?  Perhaps a workaround is for Linux to not send
> the zero probe with the deliberately incorrect sequence number.

Linux does things, which are recommended by RFC.
BSD style zero window probes are known to be wrong way.

However, I repeat, real problem is not here.

Problem is that Solaris has inconsistent window update
algorithm. It currupts its SND.WND (like all BSD), but
also fails to recover from this (unlike BSD).

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-27 Thread kuznet


Hello!

 Why is it a bug to accept the ACK from it?  RFC793 page 69 says 
 
 If the RCV.WND is zero, no segments will be acceptable, but
 special allowance should be made to accept valid ACKs, URGs and
 RSTs.

8) This obscure place is discussed for ages. The question is:
What is "valid"? Solaris folks apparently read that valid
are "all".

BSD interprets valid as "segment fits to window after truncation".


 Why shouldn't this be considered a valid ACK?

It may be considered as a valid ACK, provided all the pieces of TCP
do window updates right. If window update algorithm were sane,
it would not be a big problem from tcp viewpoint
(though it remains security hole)

Actually, the same effect (pathological window expansion)
happens in other cases. See tcp-impl, Subj: "Send window update algorithm ...".


 can point me to it.  Why doesn't the probe use the correct sequence number
 instead of backing up one?  Perhaps a workaround is for Linux to not send
 the zero probe with the deliberately incorrect sequence number.

Linux does things, which are recommended by RFC.
BSD style zero window probes are known to be wrong way.

However, I repeat, real problem is not here.

Problem is that Solaris has inconsistent window update
algorithm. It currupts its SND.WND (like all BSD), but
also fails to recover from this (unlike BSD).

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-26 Thread Dave Dykstra


Replying to Alexey's message from the mailing list archive:

> Hello! 
> 
> I take my words back. Manfred is right, this requirement is not a MUST. 
> 
> Real problem is much worse, and it is wholly on the shame of solaris. 
> Tcpdump shows at least two different bugs there. 
> 
>   2060 16:31:42.879337 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
> 80:67580(0) ack 1582261 win 1460 (DF) 
>   2061 16:31:42.907940 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
> 3721:1583721(0) ack 67580 win 1460 (DF) 
> 
> All is OK until now. Solaris's state should be: 
> 
> SND.NXT=SND.UNA=67580 
> SND.WND=1460 
> RCV.NXT=1582261 
> 
>   2062 16:31:42.908620 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
> 80:67581(1) ack 1583721 win 0 (DF) 
> 
> Solaris sends one byte. 
> 
> SND.NXT++ 
> RCV.NXT=1583721 
> 
>   2063 16:31:43.098761 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
> 3721:1583721(0) ack 67581 win 1460 (DF) 
> 
> We ACK it. 
> 
>   2064 16:31:43.100993 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 675
> 81:68456(875) ack 1583721 win 0 (DF) 
>   2065 16:31:43.101524 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 684
> 56:69041(585) ack 1583721 win 0 (DF) 
> 
> Solaris sends two segments, filling all the window. 
> 
> SND.NXT=69041 
> 
>   2066 16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
> 3720:1583720(0) ack 69041 win 0 (DF) 
> 
> We send zero window probe. SEG.SEQ=1583720. 
> 
> Solaris accepts ACK from it!!! (bug #1) But does not accept window. 


Why is it a bug to accept the ACK from it?  RFC793 page 69 says 

If the RCV.WND is zero, no segments will be acceptable, but
special allowance should be made to accept valid ACKs, URGs and
RSTs.

Why shouldn't this be considered a valid ACK?


> So, now it thinks that SND.UNA=SND.NXT=69041 
>SND.WND=1460 
> 
> State is corrupted. 
> 
> This is hard bug. But it is still not fatal. Actually, such corruptions 
> (but by different reasons) are common with stacks, which borrowed code 
> from BSD. Look into tcp-impl, Subj: "Send window update algorithm ..." 
> They are recoverable, provided stack is sane. 
> 
>   2067 16:31:43.110623 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 690
> 41:69628(587) ack 1583721 win 0 (DF) 
> 
> Solaris send some crap out of window, because of corrupted state. 
> No problems. 
> 
>   2068 16:31:43.110679 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
> 3721:1583721(0) ack 69041 win 0 (DF) 
> 
> We tell "No pasaran", of course. 
> 
> According to rules, Solaris must shrink window now. 
> This is the only way to recover corrupted state. 
> 
>   2069 16:31:43.111641 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 696
> 28:70501(873) ack 1583721 win 0 (DF) 
> 
> It does not. And this is point after which recovery is impossible. 
> Fatal bug#2. 
> 
> To resume: it is impossible to help to this from Linux side. 
> We may accept ACK from out-of-window segments, and this 
> will help in this case _occasionally_. But Solaris is still 
> deemed to lockup randomly with such sawdust in the head. 


I agree that Solaris is wrong for continuing to send data even though the
Linux receive window is 0, and I'm trying to get a bug report into Sun. 
I did not find any mention of such a problem in their patches that are
available in their online support center for any release of Solaris (I've
seen it on Solaris 2.6 and 2.7 but haven't tried others) so this may take
quite a while to get the attention of the right people.

Doesn't it seem likely, however, that the bug is being triggered by the
zero window probe that is subtracting one from the sequence number?  I
couldn't find any mention of that kind of practice in the RFC, perhaps you
can point me to it.  Why doesn't the probe use the correct sequence number
instead of backing up one?  Perhaps a workaround is for Linux to not send
the zero probe with the deliberately incorrect sequence number.

- Dave Dykstra
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-26 Thread Dave Dykstra


Replying to Alexey's message from the mailing list archive:

 Hello! 
 
 I take my words back. Manfred is right, this requirement is not a MUST. 
 
 Real problem is much worse, and it is wholly on the shame of solaris. 
 Tcpdump shows at least two different bugs there. 
 
   2060 16:31:42.879337 eth0  dynamic.ih.lucent.com.39406  static.8664: . 675
 80:67580(0) ack 1582261 win 1460 (DF) 
   2061 16:31:42.907940 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
 3721:1583721(0) ack 67580 win 1460 (DF) 
 
 All is OK until now. Solaris's state should be: 
 
 SND.NXT=SND.UNA=67580 
 SND.WND=1460 
 RCV.NXT=1582261 
 
   2062 16:31:42.908620 eth0  dynamic.ih.lucent.com.39406  static.8664: . 675
 80:67581(1) ack 1583721 win 0 (DF) 
 
 Solaris sends one byte. 
 
 SND.NXT++ 
 RCV.NXT=1583721 
 
   2063 16:31:43.098761 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
 3721:1583721(0) ack 67581 win 1460 (DF) 
 
 We ACK it. 
 
   2064 16:31:43.100993 eth0  dynamic.ih.lucent.com.39406  static.8664: P 675
 81:68456(875) ack 1583721 win 0 (DF) 
   2065 16:31:43.101524 eth0  dynamic.ih.lucent.com.39406  static.8664: P 684
 56:69041(585) ack 1583721 win 0 (DF) 
 
 Solaris sends two segments, filling all the window. 
 
 SND.NXT=69041 
 
   2066 16:31:43.108759 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
 3720:1583720(0) ack 69041 win 0 (DF) 
 
 We send zero window probe. SEG.SEQ=1583720. 
 
 Solaris accepts ACK from it!!! (bug #1) But does not accept window. 


Why is it a bug to accept the ACK from it?  RFC793 page 69 says 

If the RCV.WND is zero, no segments will be acceptable, but
special allowance should be made to accept valid ACKs, URGs and
RSTs.

Why shouldn't this be considered a valid ACK?


 So, now it thinks that SND.UNA=SND.NXT=69041 
SND.WND=1460 
 
 State is corrupted. 
 
 This is hard bug. But it is still not fatal. Actually, such corruptions 
 (but by different reasons) are common with stacks, which borrowed code 
 from BSD. Look into tcp-impl, Subj: "Send window update algorithm ..." 
 They are recoverable, provided stack is sane. 
 
   2067 16:31:43.110623 eth0  dynamic.ih.lucent.com.39406  static.8664: P 690
 41:69628(587) ack 1583721 win 0 (DF) 
 
 Solaris send some crap out of window, because of corrupted state. 
 No problems. 
 
   2068 16:31:43.110679 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
 3721:1583721(0) ack 69041 win 0 (DF) 
 
 We tell "No pasaran", of course. 
 
 According to rules, Solaris must shrink window now. 
 This is the only way to recover corrupted state. 
 
   2069 16:31:43.111641 eth0  dynamic.ih.lucent.com.39406  static.8664: P 696
 28:70501(873) ack 1583721 win 0 (DF) 
 
 It does not. And this is point after which recovery is impossible. 
 Fatal bug#2. 
 
 To resume: it is impossible to help to this from Linux side. 
 We may accept ACKWIN from out-of-window segments, and this 
 will help in this case _occasionally_. But Solaris is still 
 deemed to lockup randomly with such sawdust in the head. 


I agree that Solaris is wrong for continuing to send data even though the
Linux receive window is 0, and I'm trying to get a bug report into Sun. 
I did not find any mention of such a problem in their patches that are
available in their online support center for any release of Solaris (I've
seen it on Solaris 2.6 and 2.7 but haven't tried others) so this may take
quite a while to get the attention of the right people.

Doesn't it seem likely, however, that the bug is being triggered by the
zero window probe that is subtracting one from the sequence number?  I
couldn't find any mention of that kind of practice in the RFC, perhaps you
can point me to it.  Why doesn't the probe use the correct sequence number
instead of backing up one?  Perhaps a workaround is for Linux to not send
the zero probe with the deliberately incorrect sequence number.

- Dave Dykstra
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread kuznet


Hello!

I take my words back. Manfred is right, this requirement is not a MUST.

Real problem is much worse, and it is wholly on the shame of solaris.
Tcpdump shows at least two different bugs there.


  2060  16:31:42.879337 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
80:67580(0) ack 1582261 win 1460 (DF)
  2061  16:31:42.907940 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 67580 win 1460 (DF)

All is OK until now. Solaris's state should be:

SND.NXT=SND.UNA=67580
SND.WND=1460
RCV.NXT=1582261

  2062  16:31:42.908620 eth0 < dynamic.ih.lucent.com.39406 > static.8664: . 675
80:67581(1) ack 1583721 win 0 (DF)

Solaris sends one byte.

SND.NXT++
RCV.NXT=1583721


  2063  16:31:43.098761 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 67581 win 1460 (DF)

We ACK it.

  2064  16:31:43.100993 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 675
81:68456(875) ack 1583721 win 0 (DF)
  2065  16:31:43.101524 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 684
56:69041(585) ack 1583721 win 0 (DF)

Solaris sends two segments, filling all the window.

SND.NXT=69041


  2066  16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3720:1583720(0) ack 69041 win 0 (DF)

We send zero window probe. SEG.SEQ=1583720.

Solaris accepts ACK from it!!! (bug #1) But does not accept window.

So, now it thinks that SND.UNA=SND.NXT=69041
   SND.WND=1460

State is corrupted.

This is hard bug. But it is still not fatal. Actually, such corruptions
(but by different reasons) are common with stacks, which borrowed code
from BSD. Look into tcp-impl, Subj: "Send window update algorithm ..."
They are recoverable, provided stack is sane.


  2067  16:31:43.110623 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 690
41:69628(587) ack 1583721 win 0 (DF)

Solaris send some crap out of window, because of corrupted state.
No problems.


  2068  16:31:43.110679 eth0 > static.8664 > dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 69041 win 0 (DF)

We tell "No pasaran", of course.

According to rules, Solaris must shrink window now.
This is the only way to recover corrupted state.


  2069  16:31:43.111641 eth0 < dynamic.ih.lucent.com.39406 > static.8664: P 696
28:70501(873) ack 1583721 win 0 (DF)

It does not. And this is point after which recovery is impossible.
Fatal bug#2.


To resume: it is impossible to help to this from Linux side.
We may accept ACK from out-of-window segments, and this
will help in this case _occasionally_. But  Solaris is still
deemed to lockup randomly with such sawdust in the head.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Andi Kleen


On Thu, Jan 25, 2001 at 12:24:11PM +, Studierende der Universitaet des Saarlandes 
wrote:
> Andi wrote:
> > Basically it would accept the acks with the data in most
> > cases except when the application has totally stopped
> > reading and in that case it doesn't harm to ignore the
> > acks. 
> 
> But it seems that that's exactly what rsync does:
> It performs bulk data writes without reading. There are 32 kB in the
> receive buffers, and rsync continues to write. If the process would read
> some data the TCP stack would immediately recover.

Yes, but it has 64K of buffer. The other 32K are for skb headers. When it is 
sending MTU sized packets (and the MTU has any reasonable value) then most
of the other 32K are empty (a skb has ~150 bytes overhead) and you could
fit a few such bogus acks with data into that.

 
> RST are already processed, ACK's should be processed, but what about
> URG? 

Nobody cares about URG, it is broken anyways ;) 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Matthias Andree


On Thu, 25 Jan 2001, James Sutherland wrote:

> This isn't a violation - the section quoted does not REQUIRE the
> behaviour, it only RECOMMENDS it as being a good idea. Since implementing
> it apparently makes DoS attacks easier, NOT implementing it is now a
> better idea...

Ok, now for the interoperability. Is there a problem with this when the
recommendation is not followed?

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread James Sutherland


On Thu, 25 Jan 2001, David S. Miller wrote:

> 
> Andi Kleen writes:
>  > It's mostly for security to make it more difficult to nuke connections
>  > without knowing the sequence number.
>  > 
>  > Remember RFC is from a very different internet with much less DoS attacks.
> 
> Andi, one of the worst DoSs in the world is not being able to
> communicate with half of the systems out there.
> 
> BSD and Solaris both make these kinds of packets, therefore it is must
> to handle them properly.  So we will fix Linux, there is no argument.

Hang on... From what was quoted of the RFC, this behaviour (accepting
these packets) isn't required of hosts? In which case, if BSD or Solaris
depend on it, THEY are violating the protocol, not Linux??


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Studierende der Universitaet des Saarlandes

Andi wrote:
> Basically it would accept the acks with the data in most
> cases except when the application has totally stopped
> reading and in that case it doesn't harm to ignore the
> acks. 

But it seems that that's exactly what rsync does:
It performs bulk data writes without reading. There are 32 kB in the
receive buffers, and rsync continues to write. If the process would read
some data the TCP stack would immediately recover.

RST are already processed, ACK's should be processed, but what about
URG? 

--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread James Sutherland


On Thu, 25 Jan 2001, Matthias Andree wrote:

> On Wed, 24 Jan 2001, Andi Kleen wrote:
> 
> > It's mostly for security to make it more difficult to nuke connections
> > without knowing the sequence number.
> > 
> > Remember RFC is from a very different internet with much less DoS attacks.
> 
> If you're deliberately breaking compatibility by violating the specs,
> you're making your own DoS if your machines can't chat to each other. If
> you insist on breaking the RFC, make a sysctl for this behaviour that
> defaults to "off".

This isn't a violation - the section quoted does not REQUIRE the
behaviour, it only RECOMMENDS it as being a good idea. Since implementing
it apparently makes DoS attacks easier, NOT implementing it is now a
better idea...


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Andi Kleen

On Thu, Jan 25, 2001 at 03:44:07AM -0800, David S. Miller wrote:
> 
> Andi Kleen writes:
>  > > BSD and Solaris both make these kinds of packets, therefore it is must
>  > > to handle them properly.  So we will fix Linux, there is no argument.
>  > 
>  > How do you propose to handle them? Queue the data anyways or just process
>  > the ACK?
> 
> tcp_sequence returns two flag bits instead of it's current binary
> state.  One bit says "accept data", the other says "accept control
> bits" (such as RST, ACK, etc.)

Sounds ugly @) tcp_sequence is already far too complicated.

> 
> tcp_sequence also will truncate the data len of the SKB area if
> necessary, BSD really puts total crap in the probe byte.
> 
> Callers of tcp_sequence check the return value bits accordingly.
> This is all slow path code, so there are no performance issues.

How about simply queueing the data in that case if it already fits into the
receive buffer? The alternative would be to skb_trim() it to 0 in the slow path
of tcp_sequence and queue, but that looks wasteful.  Basically it would accept
the acks with the data in most cases except when the application has totally 
stopped reading and in that case it doesn't harm to ignore the acks. 

I have been played with a different embedded stack and it uses that 
approach and it seems to work and makes much simpler code. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread David S. Miller

Andi Kleen writes:
 > > BSD and Solaris both make these kinds of packets, therefore it is must
 > > to handle them properly.  So we will fix Linux, there is no argument.
 > 
 > How do you propose to handle them? Queue the data anyways or just process
 > the ACK?

tcp_sequence returns two flag bits instead of it's current binary
state.  One bit says "accept data", the other says "accept control
bits" (such as RST, ACK, etc.)

tcp_sequence also will truncate the data len of the SKB area if
necessary, BSD really puts total crap in the probe byte.

Callers of tcp_sequence check the return value bits accordingly.
This is all slow path code, so there are no performance issues.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Andi Kleen


On Thu, Jan 25, 2001 at 03:32:44AM -0800, David S. Miller wrote:
> 
> Andi Kleen writes:
>  > It's mostly for security to make it more difficult to nuke connections
>  > without knowing the sequence number.
>  > 
>  > Remember RFC is from a very different internet with much less DoS attacks.
> 
> Andi, one of the worst DoSs in the world is not being able to
> communicate with half of the systems out there.

If it was that serious then there would be surely more reports ;)

> 
> BSD and Solaris both make these kinds of packets, therefore it is must
> to handle them properly.  So we will fix Linux, there is no argument.

How do you propose to handle them? Queue the data anyways or just process
the ACK?


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread David S. Miller

Andi Kleen writes:
 > It's mostly for security to make it more difficult to nuke connections
 > without knowing the sequence number.
 > 
 > Remember RFC is from a very different internet with much less DoS attacks.

Andi, one of the worst DoSs in the world is not being able to
communicate with half of the systems out there.

BSD and Solaris both make these kinds of packets, therefore it is must
to handle them properly.  So we will fix Linux, there is no argument.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Matthias Andree

On Wed, 24 Jan 2001, Andi Kleen wrote:

> It's mostly for security to make it more difficult to nuke connections
> without knowing the sequence number.
> 
> Remember RFC is from a very different internet with much less DoS attacks.

If you're deliberately breaking compatibility by violating the specs,
you're making your own DoS if your machines can't chat to each other. If
you insist on breaking the RFC, make a sysctl for this behaviour that
defaults to "off".

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Matthias Andree


On Wed, 24 Jan 2001, Andi Kleen wrote:

 It's mostly for security to make it more difficult to nuke connections
 without knowing the sequence number.
 
 Remember RFC is from a very different internet with much less DoS attacks.

If you're deliberately breaking compatibility by violating the specs,
you're making your own DoS if your machines can't chat to each other. If
you insist on breaking the RFC, make a sysctl for this behaviour that
defaults to "off".

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread David S. Miller



Andi Kleen writes:
  It's mostly for security to make it more difficult to nuke connections
  without knowing the sequence number.
  
  Remember RFC is from a very different internet with much less DoS attacks.

Andi, one of the worst DoSs in the world is not being able to
communicate with half of the systems out there.

BSD and Solaris both make these kinds of packets, therefore it is must
to handle them properly.  So we will fix Linux, there is no argument.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread James Sutherland


On Thu, 25 Jan 2001, Matthias Andree wrote:

 On Wed, 24 Jan 2001, Andi Kleen wrote:
 
  It's mostly for security to make it more difficult to nuke connections
  without knowing the sequence number.
  
  Remember RFC is from a very different internet with much less DoS attacks.
 
 If you're deliberately breaking compatibility by violating the specs,
 you're making your own DoS if your machines can't chat to each other. If
 you insist on breaking the RFC, make a sysctl for this behaviour that
 defaults to "off".

This isn't a violation - the section quoted does not REQUIRE the
behaviour, it only RECOMMENDS it as being a good idea. Since implementing
it apparently makes DoS attacks easier, NOT implementing it is now a
better idea...


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Matthias Andree


On Thu, 25 Jan 2001, James Sutherland wrote:

 This isn't a violation - the section quoted does not REQUIRE the
 behaviour, it only RECOMMENDS it as being a good idea. Since implementing
 it apparently makes DoS attacks easier, NOT implementing it is now a
 better idea...

Ok, now for the interoperability. Is there a problem with this when the
recommendation is not followed?

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Studierende der Universitaet des Saarlandes


Andi wrote:
 Basically it would accept the acks with the data in most
 cases except when the application has totally stopped
 reading and in that case it doesn't harm to ignore the
 acks. 

But it seems that that's exactly what rsync does:
It performs bulk data writes without reading. There are 32 kB in the
receive buffers, and rsync continues to write. If the process would read
some data the TCP stack would immediately recover.

RST are already processed, ACK's should be processed, but what about
URG? 

--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread James Sutherland


On Thu, 25 Jan 2001, David S. Miller wrote:

 
 Andi Kleen writes:
   It's mostly for security to make it more difficult to nuke connections
   without knowing the sequence number.
   
   Remember RFC is from a very different internet with much less DoS attacks.
 
 Andi, one of the worst DoSs in the world is not being able to
 communicate with half of the systems out there.
 
 BSD and Solaris both make these kinds of packets, therefore it is must
 to handle them properly.  So we will fix Linux, there is no argument.

Hang on... From what was quoted of the RFC, this behaviour (accepting
these packets) isn't required of hosts? In which case, if BSD or Solaris
depend on it, THEY are violating the protocol, not Linux??


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Andi Kleen


On Thu, Jan 25, 2001 at 12:24:11PM +, Studierende der Universitaet des Saarlandes 
wrote:
 Andi wrote:
  Basically it would accept the acks with the data in most
  cases except when the application has totally stopped
  reading and in that case it doesn't harm to ignore the
  acks. 
 
 But it seems that that's exactly what rsync does:
 It performs bulk data writes without reading. There are 32 kB in the
 receive buffers, and rsync continues to write. If the process would read
 some data the TCP stack would immediately recover.

Yes, but it has 64K of buffer. The other 32K are for skb headers. When it is 
sending MTU sized packets (and the MTU has any reasonable value) then most
of the other 32K are empty (a skb has ~150 bytes overhead) and you could
fit a few such bogus acks with data into that.

 
 RST are already processed, ACK's should be processed, but what about
 URG? 

Nobody cares about URG, it is broken anyways ;) 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Andi Kleen


On Thu, Jan 25, 2001 at 03:32:44AM -0800, David S. Miller wrote:
 
 Andi Kleen writes:
   It's mostly for security to make it more difficult to nuke connections
   without knowing the sequence number.
   
   Remember RFC is from a very different internet with much less DoS attacks.
 
 Andi, one of the worst DoSs in the world is not being able to
 communicate with half of the systems out there.

If it was that serious then there would be surely more reports ;)

 
 BSD and Solaris both make these kinds of packets, therefore it is must
 to handle them properly.  So we will fix Linux, there is no argument.

How do you propose to handle them? Queue the data anyways or just process
the ACK?


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread David S. Miller



Andi Kleen writes:
   BSD and Solaris both make these kinds of packets, therefore it is must
   to handle them properly.  So we will fix Linux, there is no argument.
  
  How do you propose to handle them? Queue the data anyways or just process
  the ACK?

tcp_sequence returns two flag bits instead of it's current binary
state.  One bit says "accept data", the other says "accept control
bits" (such as RST, ACK, etc.)

tcp_sequence also will truncate the data len of the SKB area if
necessary, BSD really puts total crap in the probe byte.

Callers of tcp_sequence check the return value bits accordingly.
This is all slow path code, so there are no performance issues.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread Andi Kleen


On Thu, Jan 25, 2001 at 03:44:07AM -0800, David S. Miller wrote:
 
 Andi Kleen writes:
BSD and Solaris both make these kinds of packets, therefore it is must
to handle them properly.  So we will fix Linux, there is no argument.
   
   How do you propose to handle them? Queue the data anyways or just process
   the ACK?
 
 tcp_sequence returns two flag bits instead of it's current binary
 state.  One bit says "accept data", the other says "accept control
 bits" (such as RST, ACK, etc.)

Sounds ugly @) tcp_sequence is already far too complicated.


 
 tcp_sequence also will truncate the data len of the SKB area if
 necessary, BSD really puts total crap in the probe byte.
 
 Callers of tcp_sequence check the return value bits accordingly.
 This is all slow path code, so there are no performance issues.

How about simply queueing the data in that case if it already fits into the
receive buffer? The alternative would be to skb_trim() it to 0 in the slow path
of tcp_sequence and queue, but that looks wasteful.  Basically it would accept
the acks with the data in most cases except when the application has totally 
stopped reading and in that case it doesn't harm to ignore the acks. 

I have been played with a different embedded stack and it uses that 
approach and it seems to work and makes much simpler code. 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-25 Thread kuznet


Hello!

I take my words back. Manfred is right, this requirement is not a MUST.

Real problem is much worse, and it is wholly on the shame of solaris.
Tcpdump shows at least two different bugs there.


  2060  16:31:42.879337 eth0  dynamic.ih.lucent.com.39406  static.8664: . 675
80:67580(0) ack 1582261 win 1460 (DF)
  2061  16:31:42.907940 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 67580 win 1460 (DF)

All is OK until now. Solaris's state should be:

SND.NXT=SND.UNA=67580
SND.WND=1460
RCV.NXT=1582261

  2062  16:31:42.908620 eth0  dynamic.ih.lucent.com.39406  static.8664: . 675
80:67581(1) ack 1583721 win 0 (DF)

Solaris sends one byte.

SND.NXT++
RCV.NXT=1583721


  2063  16:31:43.098761 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 67581 win 1460 (DF)

We ACK it.

  2064  16:31:43.100993 eth0  dynamic.ih.lucent.com.39406  static.8664: P 675
81:68456(875) ack 1583721 win 0 (DF)
  2065  16:31:43.101524 eth0  dynamic.ih.lucent.com.39406  static.8664: P 684
56:69041(585) ack 1583721 win 0 (DF)

Solaris sends two segments, filling all the window.

SND.NXT=69041


  2066  16:31:43.108759 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
3720:1583720(0) ack 69041 win 0 (DF)

We send zero window probe. SEG.SEQ=1583720.

Solaris accepts ACK from it!!! (bug #1) But does not accept window.

So, now it thinks that SND.UNA=SND.NXT=69041
   SND.WND=1460

State is corrupted.

This is hard bug. But it is still not fatal. Actually, such corruptions
(but by different reasons) are common with stacks, which borrowed code
from BSD. Look into tcp-impl, Subj: "Send window update algorithm ..."
They are recoverable, provided stack is sane.


  2067  16:31:43.110623 eth0  dynamic.ih.lucent.com.39406  static.8664: P 690
41:69628(587) ack 1583721 win 0 (DF)

Solaris send some crap out of window, because of corrupted state.
No problems.


  2068  16:31:43.110679 eth0  static.8664  dynamic.ih.lucent.com.39406: . 158
3721:1583721(0) ack 69041 win 0 (DF)

We tell "No pasaran", of course.

According to rules, Solaris must shrink window now.
This is the only way to recover corrupted state.


  2069  16:31:43.111641 eth0  dynamic.ih.lucent.com.39406  static.8664: P 696
28:70501(873) ack 1583721 win 0 (DF)

It does not. And this is point after which recovery is impossible.
Fatal bug#2.


To resume: it is impossible to help to this from Linux side.
We may accept ACKWIN from out-of-window segments, and this
will help in this case _occasionally_. But  Solaris is still
deemed to lockup randomly with such sawdust in the head.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread kuznet


Hello!

> must be). Is there another RFC?

It is exactly this place.

As soon as BSD uses this feature, it is must for us.


> Could you check what happened in line 2066 of this tcpdump?

> 2066  16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
> . 1583720:1583720(0) ack 69041 win 0 (DF)

This is usual RFC-style zero-window probe: null segment out of window.

Alexey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread Andi Kleen


On Wed, Jan 24, 2001 at 11:03:34PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > I read through the tcpdump, and it seems that Linux completely ignores
> > packets with out-of-window sequence numbers:
> 
> Yes, Linux is __very__ not right doing this. RFC requires to accept
> ACK, URG and RST on any segment adjacent to window, even if window
> is zero.

It's mostly for security to make it more difficult to nuke connections
without knowing the sequence number.

Remember RFC is from a very different internet with much less DoS attacks.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread Manfred Spraul


> Yes, Linux is __very__ not right doing this. RFC requires to accept
> ACK, URG and RST on any segment adjacent to window, even if window
> is zero.
> 
Interesting: I checked the RFC 793 and came to the conclusion that Linux
is correct. ("special allowance should be made to accept valid ACKs" not
must be). Is there another RFC?

Note that either segment length is > 0 or SEG.SEQ != RCV.NXT

But I spotted a second problem, might be minor:

Could you check what happened in line 2066 of this tcpdump?

in line 2064 static (2.2.18-pre17) ack'ed 1583721.
in line 2066 dynamic (Solaris 7.0) sends data although the window is 0
in line 2067 static complains, but now ack is 1583720

2060  16:31:42.879337 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
. 67580:67580(0) ack 1582261 win 1460 (DF)
2061  16:31:42.907940 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
. 1583721:1583721(0) ack 67580 win 1460 (DF)
2062  16:31:42.908620 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
. 67580:67581(1) ack 1583721 win 0 (DF)
2063  16:31:43.098761 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
. 1583721:1583721(0) ack 67581 win 1460 (DF)
2064  16:31:43.100993 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
P 67581:68456(875) ack 1583721 win 0 (DF)
2065  16:31:43.101524 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
P 68456:69041(585) ack 1583721 win 0 (DF)
2066  16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
. 1583720:1583720(0) ack 69041 win 0 (DF)
2067  16:31:43.110623 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
P 69041:69628(587) ack 1583721 win 0 (DF)
2068  16:31:43.110679 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
. 1583721:1583721(0) ack 69041 win 0 (DF)

An off-by-one error somewhere in tcp_send_ack()?
A second packet with the off-by-one ack number is send a few packets
later, all other packets are correct.
--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread kuznet


Hello!

> I read through the tcpdump, and it seems that Linux completely ignores
> packets with out-of-window sequence numbers:

Yes, Linux is __very__ not right doing this. RFC requires to accept
ACK, URG and RST on any segment adjacent to window, even if window
is zero.

Solaris also does thing, formally wrong, but it would work if linux
would be formally correct.

O-ho-ho... It is difficult bug.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread kuznet


Hello!

 I read through the tcpdump, and it seems that Linux completely ignores
 packets with out-of-window sequence numbers:

Yes, Linux is __very__ not right doing this. RFC requires to accept
ACK, URG and RST on any segment adjacent to window, even if window
is zero.

Solaris also does thing, formally wrong, but it would work if linux
would be formally correct.

O-ho-ho... It is difficult bug.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread Manfred Spraul


 Yes, Linux is __very__ not right doing this. RFC requires to accept
 ACK, URG and RST on any segment adjacent to window, even if window
 is zero.
 
Interesting: I checked the RFC 793 and came to the conclusion that Linux
is correct. ("special allowance should be made to accept valid ACKs" not
must be). Is there another RFC?

Note that either segment length is  0 or SEG.SEQ != RCV.NXT

But I spotted a second problem, might be minor:

Could you check what happened in line 2066 of this tcpdump?

in line 2064 static (2.2.18-pre17) ack'ed 1583721.
in line 2066 dynamic (Solaris 7.0) sends data although the window is 0
in line 2067 static complains, but now ack is 1583720

2060  16:31:42.879337 eth0  dynamic.ih.lucent.com.39406  static.8664:
. 67580:67580(0) ack 1582261 win 1460 (DF)
2061  16:31:42.907940 eth0  static.8664  dynamic.ih.lucent.com.39406:
. 1583721:1583721(0) ack 67580 win 1460 (DF)
2062  16:31:42.908620 eth0  dynamic.ih.lucent.com.39406  static.8664:
. 67580:67581(1) ack 1583721 win 0 (DF)
2063  16:31:43.098761 eth0  static.8664  dynamic.ih.lucent.com.39406:
. 1583721:1583721(0) ack 67581 win 1460 (DF)
2064  16:31:43.100993 eth0  dynamic.ih.lucent.com.39406  static.8664:
P 67581:68456(875) ack 1583721 win 0 (DF)
2065  16:31:43.101524 eth0  dynamic.ih.lucent.com.39406  static.8664:
P 68456:69041(585) ack 1583721 win 0 (DF)
2066  16:31:43.108759 eth0  static.8664  dynamic.ih.lucent.com.39406:
. 1583720:1583720(0) ack 69041 win 0 (DF)
2067  16:31:43.110623 eth0  dynamic.ih.lucent.com.39406  static.8664:
P 69041:69628(587) ack 1583721 win 0 (DF)
2068  16:31:43.110679 eth0  static.8664  dynamic.ih.lucent.com.39406:
. 1583721:1583721(0) ack 69041 win 0 (DF)

An off-by-one error somewhere in tcp_send_ack()?
A second packet with the off-by-one ack number is send a few packets
later, all other packets are correct.
--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread Andi Kleen


On Wed, Jan 24, 2001 at 11:03:34PM +0300, [EMAIL PROTECTED] wrote:
 Hello!
 
  I read through the tcpdump, and it seems that Linux completely ignores
  packets with out-of-window sequence numbers:
 
 Yes, Linux is __very__ not right doing this. RFC requires to accept
 ACK, URG and RST on any segment adjacent to window, even if window
 is zero.

It's mostly for security to make it more difficult to nuke connections
without knowing the sequence number.

Remember RFC is from a very different internet with much less DoS attacks.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread kuznet


Hello!

 must be). Is there another RFC?

It is exactly this place.

As soon as BSD uses this feature, it is must for us.


 Could you check what happened in line 2066 of this tcpdump?

 2066  16:31:43.108759 eth0  static.8664  dynamic.ih.lucent.com.39406:
 . 1583720:1583720(0) ack 69041 win 0 (DF)

This is usual RFC-style zero-window probe: null segment out of window.

Alexey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Manfred Spraul


I checked RFC793, and AFAICS Solaris is the culprit:
it sends out invalid packets, Linux ignores them and thus Linux doesn't
receive acks.

Which Solaris version do you use?

* The last valid ack from the Solaris computer is for byte 1583721, win
8760 (line 2078)

* No packet after line 2078 from the Solaris computer passed the
acceptability test from RFC793, page 69. Thus Linux ignores these
packets completely.

* Linux sends out packets up to 1591021:1592481(1460) without receiving
_valid_ acks, then it begins to retry 1583721:1585181(1460) every 2
seconds until the end of the tcpdump.

--  
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Manfred Spraul


I read through the tcpdump, and it seems that Linux completely ignores
packets with out-of-window sequence numbers:


* the solaris computers (dynamic...) sends further data although the
Linux box (static) says 'win 0'.
See lines 2067, 2069, 2076, ...
2066  16:31:43.108759 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
. 1583720:1583720(0) ack 69041 win 0 (DF)
2067  16:31:43.110623 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
P 69041:69628(587) ack 1583721 win 0 (DF)


2078  16:31:43.896657 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
. 69041:69041(0) ack 1583721 win 8760 (DF)
* this is the last ack with an in-window sequence number.
.
.
.
2136  16:35:08.488774 eth0 > static.8664 > dynamic.ih.lucent.com.39406:
. 1583721:1585181(1460) ack 69041 win 0 (DF)
* the linux computer sends data
2137  16:35:08.492158 eth0 < dynamic.ih.lucent.com.39406 > static.8664:
. 70501:70501(0) ack 1592481 win 8760 (DF)
* but ignores the ack, probably because the sequence number is out of
window

Perhaps someone who understand TCP could check the code and compare it
with the RFC's?

--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Dave Dykstra


I'm sorry I didn't give you a more specific version number: the "X" in the
2.2.18preX kernel version we tried is 17.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Dave Dykstra


I'm sorry I didn't give you a more specific version number: the "X" in the
2.2.18preX kernel version we tried is 17.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Manfred Spraul


I read through the tcpdump, and it seems that Linux completely ignores
packets with out-of-window sequence numbers:


* the solaris computers (dynamic...) sends further data although the
Linux box (static) says 'win 0'.
See lines 2067, 2069, 2076, ...
2066  16:31:43.108759 eth0  static.8664  dynamic.ih.lucent.com.39406:
. 1583720:1583720(0) ack 69041 win 0 (DF)
2067  16:31:43.110623 eth0  dynamic.ih.lucent.com.39406  static.8664:
P 69041:69628(587) ack 1583721 win 0 (DF)


2078  16:31:43.896657 eth0  dynamic.ih.lucent.com.39406  static.8664:
. 69041:69041(0) ack 1583721 win 8760 (DF)
* this is the last ack with an in-window sequence number.
.
.
.
2136  16:35:08.488774 eth0  static.8664  dynamic.ih.lucent.com.39406:
. 1583721:1585181(1460) ack 69041 win 0 (DF)
* the linux computer sends data
2137  16:35:08.492158 eth0  dynamic.ih.lucent.com.39406  static.8664:
. 70501:70501(0) ack 1592481 win 8760 (DF)
* but ignores the ack, probably because the sequence number is out of
window

Perhaps someone who understand TCP could check the code and compare it
with the RFC's?

--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Manfred Spraul


I checked RFC793, and AFAICS Solaris is the culprit:
it sends out invalid packets, Linux ignores them and thus Linux doesn't
receive acks.

Which Solaris version do you use?

* The last valid ack from the Solaris computer is for byte 1583721, win
8760 (line 2078)

* No packet after line 2078 from the Solaris computer passed the
acceptability test from RFC793, page 69. Thus Linux ignores these
packets completely.

* Linux sends out packets up to 1591021:1592481(1460) without receiving
_valid_ acks, then it begins to retry 1583721:1585181(1460) every 2
seconds until the end of the tcpdump.

--  
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

40 matches

Mail list logo