Hi Neal, I've included a link to small trace of 13 packets which is different from the screenshot I attached in my last email, but shows the same sequence of events. It's a bit hard to read the tcptrace due to the 300ms timeout, so I figured this was the best approach.
slice.pcap: https://drive.google.com/open?id=1hYXbUClHGbQv1hWG1HZWDO2WYf30N6G8 Thanks for the help! -Steve On Tue, Dec 5, 2017 at 7:23 AM, Neal Cardwell <ncardw...@google.com> wrote: > On Tue, Dec 5, 2017 at 12:22 AM, Steve Ibanez <siba...@stanford.edu> wrote: >> Hi Neal, >> >> Happy to help out :) And thanks for the tip! >> >> I was able to track down where the missing bytes that you pointed out >> are being lost. It turns out the destination host seems to be >> misbehaving. I performed a packet capture at the destination host >> interface (a snapshot of the trace is attached). I see the following >> sequence of events when a timeout occurs (note that I have NIC >> offloading enabled so wireshark captures packets larger than the MTU): >> >> 1. The destination receives a data packet of length X with seqNo = Y >> from the src with the CWR bit set and does not send back a >> corresponding ACK. >> 2. The source times out and sends a retransmission packet of length Z >> (where Z < X) with seqNo = Y >> 3. The destination sends back an ACK with AckNo = Y + X >> >> So in other words, the packet which the destination host does not >> initially ACK (causing the timeout) does not actually get lost because >> after receiving the retransmission the AckNo moves forward all the way >> past the bytes in the initial unACKed CWR packet. In the attached >> screenshot, I've marked the unACKed CWR packet with a red box. >> >> Have you seen this behavior before? And do you know what might be >> causing the destination host not to ACK the CWR packet? In most cases >> the CWR marked packets are ACKed properly, it's just occasionally they >> are not. > > Thanks for the detailed report! > > I have not heard of an incoming CWR causing the receiver to fail to > ACK. And in re-reading the code, I don't see an obvious way in which a > CWR bit should cause the receiver to fail to ACK. > > That screen shot is a bit hard to parse. Would you be able to post a > tcpdump .pcap of that particular section, or post a screen shot of a > time-sequence plot of that section? > > To extract that segment and take screen shot, you could use something like: > > editcap -A "2017-12-04 11:22:27" -B "2017-12-04 11:22:30" all.pcap > slice.pcap > tcptrace -S -xy -zy slice.pcap > xplot.org a2b_tsg.xpl & > # take screenshot > > Or, alternatively, would you be able to post the slice.pcap on a web > server or public drive? > > thanks, > neal