So, someone sent me the text from TCP/IP Illlustrated Vol. II which
describes how keepaplives work. I was right about how they work. Here's what
it says.

"If TCP hasn't reached the keepalive limit, tcp_respond sends a keepalive
packet. The acknowledgement field of the keepalive packet (the fourth
argument to tcp_respond) contains rcv_next, the next sequence number
expected on the connection. The sequence number field of the keepalive
packet (the fifth argument) deliberately contains snd_una minus 1, which is
the sequence number of a byte of data that the other end has already
acknowledged (Figure 24.17). Since this sequence number is outside the
window, the other end must respond with an ACK, specifying the next sequence
number it expects."

So, you have me concerned that you believed what the vendor told you! The
vendor's implementation of the keepalive process (with the ACK going
backwards) is broken, dude. You need to tell them that. Yeah, we all like to
blame the sys admin, but what the sys admin did should have worked. It
probably wouldn't have caused a reset if the keepalive process behaved
correctly.

Keepalives are bad news all around, but there are some uses for them. Not
using them may not be the solution. Of course getting a vendor to fix their
software is difficult too, so there may be no easy solution to this problem
(if not using keepalives turns out to introduce other problems).

Good luck. Keep us posted Thanks.

_______________________________

Priscilla Oppenheimer
www.troubleshootingnetworks.com
www.priscilla.com

Priscilla Oppenheimer wrote:
> 
> Matthew Tayler wrote:
> > 
> > Thanks for the info, but at this time it looks as though MFC
> > was correct.
> 
> That's great that he helped you find the problem, but was he
> really correct? There's a difference (that matters to me at
> least. ;-)
> 
> I'm glad you found a workaround (to stop messing with the
> keepalive timer.) But I still question your and MFC's
> description of the keepalive process and wonder, if the systems
> implemented the keepalive as described in RFC 1122, would there
> be fewer problems? Of course, that RFC also says that
> keepalives are problematic no matter what. So your solution to
> not use them (or at least not use them when only 37 seconds
> have elapsed) sounds like a good plan.
> 
> It's frustrating to try to understand your problem when all you
> (and MFC) talk about is ACKs, though. ACKs are only 1/2 the
> story. Your comment about setting the ACK to ACK-1 causing the
> other side to send back the correct ACK can't be quite right.
> ACKs ack segments, not ACKs. TCP doesn't ACK an ACK.
> 
> I don't have TCP/IP Illustrated Vol. II, but I do have Volume
> I, which also discusses the keepalive process in Chapter 23. I
> have read other dseciptions of the keepalive process also and
> seen it in action in analyzer traces. The sender of the probe
> sets the sequence number to SEQ-1. That's quite different from
> setting the ACK to ACK-1.
> 
> If you get a chance, can you tell me what TCP/IP Illustrated
> Vol. II says exactly? Thanks. Perhaps another way of doing
> keepalives is to set the ACK to ACK -1 on the probe, causing
> the other side to resend the previous segment or something of
> that sort. But that's problematic because then there's a
> resending of upper-layer data too, which the "real" keepalive
> process addresses. It's also not how the keepalive process is
> described in these references:
> 
> RFC 1122 (search for TCP keep-alive)
> http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1122.html
> 
> TCP/IP Illustrated Volume I, Chapter 23:
> http://www.crewcam.com/docs/tcpip/tcp_keep.htm
> (an aside: I wonder if it's legal to have the book at that
> site??)
> 
> _______________________________
> 
> Priscilla Oppenheimer
> www.troubleshootingnetworks.com
> www.priscilla.com
> 
> > 
> > Both hosts are running Tru64 (DEC Unix) and for whatever
> > reasons the system
> > admin for Host B has 'enabled' modified keepalives and
> modified
> > the timers -
> > amongst other things - at which probes are sent to 37 seconds,
> > which fits
> > with the problems we are seeing. Actually what they have set
> is
> > 75 half
> > seconds to wait before issuing a probe.
> > 
> > What we now also know is that the behaviour for sending ACK
> > minus 1 is
> > correct and that also the ACK's in response to the probe at
> > least according
> > to DEC/Compaq/HP may "not be received" causing a reset. The
> ACK
> > minus 1
> > forces the other end to send back the correct ACK and in fact
> > saying I am
> > still here. Its down to host implementations as to how this is
> > done but the
> > standard practice is as described above or more precisely in
> > Vol II TCP/IP
> > Illustrated.
> > 
> > Anyway having persuaded the system admin guy to stop playing
> > around with key
> > systems as his personal toys the timers have been reset to
> > default and all
> > is now well.
> > 
> > Thanks again
> > 
> > Matt T
> > 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:nobody@;groupstudy.com]
> > Sent: 24 October 2002 17:55
> > To: [EMAIL PROTECTED]
> > Subject: RE: TCP Ack numbers suddenly regress [7:56189]
> > 
> > 
> > The keepalive process shouldn't cause ACKs to go backwards. It
> > should cause
> > them to stay the same. This doesn't sound like a keepalive 
> > situation which
> > should proceed smoothly. This situation involves a RESET which
> > usually
> > indicates a problem of some sort, although possibly just a
> > minor problem. It
> > sounds more like a bug in the TCP implementation to me. We
> > would have to see
> > both sides of the conversation, including what both sides
> send,
> > not just
> > what they ACK, to troubleshoot this.
> > 
> > The TCP RFC doesn't cover keepalives. They are mentioned in
> the
> > Host
> > Requirements RFC 1122, which is pretty critical of them, but
> > admits that
> > they MAY be included in a TCP implementation.
> > 
> > After 2 hours (by default) a UNIX system that is using
> > keepalives sends
> > either any empty segment or a segment with one byte of garbage
> > data. For the
> > sequence number, it uses the sequence number of the last byte
> > already sent.
> > This should cause the other side to send the last ACK that it
> > sent.
> > 
> > Example:
> > 
> > Host A sends bytes 100-200, SEQ number = 100
> > Host B ACKs, ACK number = 201
> > two hours
> > Host A sends segment with SEQ number = 200
> > Host B ACKS, ACK number = 201
> > 
> > To troubleshoot, you can't just look at ACKs anyway. You have
> > to look at
> > both sides of the conversation. Also look at the timing. Did 2
> > hours go by?
> > 
> > Also, what's the actual user complaint? Or is this just
> > something you
> > happened to notice in a trace?
> > 
> > What is the network topology? Where are these hosts and what's
> > in between
> > them? Is there some sort of "feature" running between them
> that
> > messes with
> > TCP? For example a firewall or a router that does TCP
> > Intercept??
> > 
> > _______________________________
> > 
> > Priscilla Oppenheimer
> > www.troubleshootingnetworks.com
> > www.priscilla.com
> > 
> > 
> > Matthew F. Crane wrote:
> > >
> > > Ok you don't say what they host systems are but I am going
> to
> > > guess Unix of
> > > some variety, in which case has anyone been playing around
> > with
> > > the
> > > keepalive timers ?
> > >
> > > If the session keepalive timer is reached a probe is sent
> with
> > > the ACK
> > > number set to ACK-1 i.e. telling the other end that the
> > > recipient lied
> > > previously when it said it had received all the data. This
> > > forces the origin
> > > to resend with the correct ACK number
> > >
> > > TCP/IP Illustrated Vol 2 p830
> > >
> > > There are probably other instances where this is done but
> > > that's the one
> > > I've come across most often.
> > >
> > > MFC
> > >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] [mailto:nobody@;groupstudy.com]On
> > > Behalf Of
> > > Matthew Tayler
> > > Sent: 24 October 2002 09:04
> > > To: [EMAIL PROTECTED]
> > > Subject: TCP Ack numbers suddenly regress [7:56189]
> > >
> > >
> > > Anyone come across a situation where the ACK number suddenly
> > > steps back 1
> > > and the link then resets ?
> > >
> > > Host A to Host B is running fine with the app using port
> 2400
> > > on A talking
> > > to an app on B using ports 3564 & 3565 are in use. We have
> > > several traces
> > > showing the steady increase of sequence numbers then all of
> a
> > > sudden the ACK
> > > number takes step back by 1. There are no FIN segments in
> the
> > > preceeding
> > > traffic, but the now regressed ACK number is repeated in 7
> > > segments sent and
> > > then a reset segment is issued and the two start exchanging
> > > data again.
> > >
> > > I am not allowed to post any of the data from the trace
> given
> > > the nature of
> > > the two systems involved, but here is an example of the way
> > the
> > > ACK numbers
> > > run
> > >
> > > >From A to B port 2400 to 3564
> > > 4567 is ACK'd
> > > 4785 .....
> > > 4948
> > > 4947
> > >
> > > >From A to B port 2400 to 3565
> > > 466 is ACK'd
> > > 483 .....
> > > 500
> > > 499
> > >
> > > The link between the two is fine during this problem,
> > > utilisation drops but
> > > is nevera bove 20% anyway. Both host applicationms are still
> > > running and
> > > there are no process issues. The Cisco kit at either end is
> > > happy no error
> > > messages or the like so I we knows its host/app related.
> > >
> > > I can't find anything this specific in the archives and the
> > > nearest any of
> > > my textbooks come is to say a FIN has been issued - which
> the
> > > trace says is
> > > not the case.
> > >
> > > The reason for asking is that I didn't think it was possible
> > to
> > > regress the
> > > sequence numbers, with the exception of the example from
> > TCP/IP
> > > Illustrated
> > > Vol 2 noted above.
> > >
> > > Any ideas would be appreciated.
> > >
> > > Thanks
> > >
> > > Matt T
> > 
> > 
> 
> 




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=56330&t=56189
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

Reply via email to