Hi Neal,

I've included a link to small trace of 13 packets which is different
from the screenshot I attached in my last email, but shows the same
sequence of events. It's a bit hard to read the tcptrace due to the
300ms timeout, so I figured this was the best approach.

slice.pcap: https://drive.google.com/open?id=1hYXbUClHGbQv1hWG1HZWDO2WYf30N6G8

Thanks for the help!
-Steve

On Tue, Dec 5, 2017 at 7:23 AM, Neal Cardwell <ncardw...@google.com> wrote:
> On Tue, Dec 5, 2017 at 12:22 AM, Steve Ibanez <siba...@stanford.edu> wrote:
>> Hi Neal,
>>
>> Happy to help out :) And thanks for the tip!
>>
>> I was able to track down where the missing bytes that you pointed out
>> are being lost. It turns out the destination host seems to be
>> misbehaving. I performed a packet capture at the destination host
>> interface (a snapshot of the trace is attached). I see the following
>> sequence of events when a timeout occurs (note that I have NIC
>> offloading enabled so wireshark captures packets larger than the MTU):
>>
>> 1. The destination receives a data packet of length X with seqNo = Y
>> from the src with the CWR bit set and does not send back a
>> corresponding ACK.
>> 2. The source times out and sends a retransmission packet of length Z
>> (where Z < X) with seqNo = Y
>> 3. The destination sends back an ACK with AckNo = Y + X
>>
>> So in other words, the packet which the destination host does not
>> initially ACK (causing the timeout) does not actually get lost because
>> after receiving the retransmission the AckNo moves forward all the way
>> past the bytes in the initial unACKed CWR packet. In the attached
>> screenshot, I've marked the unACKed CWR packet with a red box.
>>
>> Have you seen this behavior before? And do you know what might be
>> causing the destination host not to ACK the CWR packet? In most cases
>> the CWR marked packets are ACKed properly, it's just occasionally they
>> are not.
>
> Thanks for the detailed report!
>
> I have not heard of an incoming CWR causing the receiver to fail to
> ACK. And in re-reading the code, I don't see an obvious way in which a
> CWR bit should cause the receiver to fail to ACK.
>
> That screen shot is a bit hard to parse. Would you be able to post a
> tcpdump .pcap of that particular section, or post a screen shot of a
> time-sequence plot of that section?
>
> To extract that segment and take screen shot, you could use something like:
>
>   editcap -A "2017-12-04 11:22:27"  -B "2017-12-04 11:22:30"  all.pcap
> slice.pcap
>   tcptrace -S -xy -zy slice.pcap
>   xplot.org a2b_tsg.xpl &
>   # take screenshot
>
> Or, alternatively, would you be able to post the slice.pcap on a web
> server or public drive?
>
> thanks,
> neal

Reply via email to