Thanks for the info, but at this time it looks as though MFC was correct.

Both hosts are running Tru64 (DEC Unix) and for whatever reasons the system
admin for Host B has 'enabled' modified keepalives and modified the timers -
amongst other things - at which probes are sent to 37 seconds, which fits
with the problems we are seeing. Actually what they have set is 75 half
seconds to wait before issuing a probe.

What we now also know is that the behaviour for sending ACK minus 1 is
correct and that also the ACK's in response to the probe at least according
to DEC/Compaq/HP may "not be received" causing a reset. The ACK minus 1
forces the other end to send back the correct ACK and in fact saying I am
still here. Its down to host implementations as to how this is done but the
standard practice is as described above or more precisely in Vol II TCP/IP
Illustrated.

Anyway having persuaded the system admin guy to stop playing around with key
systems as his personal toys the timers have been reset to default and all
is now well.

Thanks again

Matt T

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:nobody@;groupstudy.com]
Sent: 24 October 2002 17:55
To: [EMAIL PROTECTED]
Subject: RE: TCP Ack numbers suddenly regress [7:56189]


The keepalive process shouldn't cause ACKs to go backwards. It should cause
them to stay the same. This doesn't sound like a keepalive  situation which
should proceed smoothly. This situation involves a RESET which usually
indicates a problem of some sort, although possibly just a minor problem. It
sounds more like a bug in the TCP implementation to me. We would have to see
both sides of the conversation, including what both sides send, not just
what they ACK, to troubleshoot this.

The TCP RFC doesn't cover keepalives. They are mentioned in the Host
Requirements RFC 1122, which is pretty critical of them, but admits that
they MAY be included in a TCP implementation.

After 2 hours (by default) a UNIX system that is using keepalives sends
either any empty segment or a segment with one byte of garbage data. For the
sequence number, it uses the sequence number of the last byte already sent.
This should cause the other side to send the last ACK that it sent.

Example:

Host A sends bytes 100-200, SEQ number = 100
Host B ACKs, ACK number = 201
two hours
Host A sends segment with SEQ number = 200
Host B ACKS, ACK number = 201

To troubleshoot, you can't just look at ACKs anyway. You have to look at
both sides of the conversation. Also look at the timing. Did 2 hours go by?

Also, what's the actual user complaint? Or is this just something you
happened to notice in a trace?

What is the network topology? Where are these hosts and what's in between
them? Is there some sort of "feature" running between them that messes with
TCP? For example a firewall or a router that does TCP Intercept??

_______________________________

Priscilla Oppenheimer
www.troubleshootingnetworks.com
www.priscilla.com


Matthew F. Crane wrote:
>
> Ok you don't say what they host systems are but I am going to
> guess Unix of
> some variety, in which case has anyone been playing around with
> the
> keepalive timers ?
>
> If the session keepalive timer is reached a probe is sent with
> the ACK
> number set to ACK-1 i.e. telling the other end that the
> recipient lied
> previously when it said it had received all the data. This
> forces the origin
> to resend with the correct ACK number
>
> TCP/IP Illustrated Vol 2 p830
>
> There are probably other instances where this is done but
> that's the one
> I've come across most often.
>
> MFC
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:nobody@;groupstudy.com]On
> Behalf Of
> Matthew Tayler
> Sent: 24 October 2002 09:04
> To: [EMAIL PROTECTED]
> Subject: TCP Ack numbers suddenly regress [7:56189]
>
>
> Anyone come across a situation where the ACK number suddenly
> steps back 1
> and the link then resets ?
>
> Host A to Host B is running fine with the app using port 2400
> on A talking
> to an app on B using ports 3564 & 3565 are in use. We have
> several traces
> showing the steady increase of sequence numbers then all of a
> sudden the ACK
> number takes step back by 1. There are no FIN segments in the
> preceeding
> traffic, but the now regressed ACK number is repeated in 7
> segments sent and
> then a reset segment is issued and the two start exchanging
> data again.
>
> I am not allowed to post any of the data from the trace given
> the nature of
> the two systems involved, but here is an example of the way the
> ACK numbers
> run
>
> >From A to B port 2400 to 3564
> 4567 is ACK'd
> 4785 .....
> 4948
> 4947
>
> >From A to B port 2400 to 3565
> 466 is ACK'd
> 483 .....
> 500
> 499
>
> The link between the two is fine during this problem,
> utilisation drops but
> is nevera bove 20% anyway. Both host applicationms are still
> running and
> there are no process issues. The Cisco kit at either end is
> happy no error
> messages or the like so I we knows its host/app related.
>
> I can't find anything this specific in the archives and the
> nearest any of
> my textbooks come is to say a FIN has been issued - which the
> trace says is
> not the case.
>
> The reason for asking is that I didn't think it was possible to
> regress the
> sequence numbers, with the exception of the example from TCP/IP
> Illustrated
> Vol 2 noted above.
>
> Any ideas would be appreciated.
>
> Thanks
>
> Matt T




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=56264&t=56189
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

Reply via email to