On Mon, Nov 20, 2017 at 7:01 AM, Neal Cardwell <ncardw...@google.com> wrote: > Going back to one of your Oct 19 trace snapshots (attached), AFAICT at the > time of the timeout there is actually almost 64KBytes (352553398 + 1448 - > 352489686 = 65160) of unacknowledged data. So there really does seem to be a > significant chunk of packets that were in-flight that were then declared > lost. > > So here is a possibility: perhaps the combination of CWR+PRR plus > tcp_tso_should_defer() means that PRR can make cwnd so gentle that > tcp_tso_should_defer() thinks we should wait for another ACK to send, and > that ACK doesn't come. Breaking it, down, the potential sequence would be: > > (1) tcp_write_xmit() does not send, because the CWR behavior, using PRR, > does not leave enough cwnd for tcp_tso_should_defer() to think we should > send (PRR was originally designed for recovery, which did not have TSO > deferral) > > (2) TLP does not fire, because we are in state CWR, not Open > > (3) The only remaining option is an RTO, which fires. > > In other words, the possibility is that, at the time of the stall, the cwnd > is reasonably high, but tcp_packets_in_flight() is also quite high, so > either there is (a) literally no unused cwnd left ( tcp_packets_in_flight() > == cwnd), or (b) some mechanism like tcp_tso_should_defer() is deciding that > there is not enough available cwnd for it to make sense to chop off a > fraction of a TSO skb to send now. > > One way to test that conjecture would be to disable tcp_tso_should_defer() > by adding a: > > goto send_now; > > at the top of tcp_tso_should_defer(). > > If that doesn't prevent the freezes then I would recommend adding printks or > other instrumentation to tcp_write_xmit() to log: > > - time > - ca_state > - cwnd > - ssthresh > - tcp_packets_in_flight() > - the reason for breaking out of the tcp_write_xmit() loop (tso deferral, no > packets left, tcp_snd_wnd_test, tcp_nagle_test, etc) > > cheers, > neal > > > > On Mon, Nov 20, 2017 at 2:31 AM, Steve Ibanez <siba...@stanford.edu> wrote: >> >> Hi Folks, >> >> I wanted to check back in on this for another update and to solicit >> some more suggestions. I did a bit more digging to try an isolate the >> problem. >> >> As I explained earlier, the log generated by tcp_probe indicates that >> the snd_cwnd is set to 1 just before the end host receives an ECN >> marked ACK and unexpectedly enters a timeout ( >> https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ). >> I was trying to track down where this is happening, but the only place >> I could find that might be setting the snd_cwnd to 1 is in the >> tcp_enter_loss() function. I inserted a printk() call in this function >> to see when it is being invoked and it looks like it is only called by >> the tcp_retransmit_timer() function after the timer expires. >> >> I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state >> inside the tcp_fastretrans_alert() function whenever it processes an >> ECN marked ACK ( >> https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB ) >> This plot also shows when the tcp_retransmit_timer() and >> tcp_enter_loss() functions are invoked (red and purple dots >> respectively). And I see that the ACK state machine is always either >> in the TCP_CA_Open or TCP_CA_CWR state whenever the >> tcp_fastretrans_alert() function processes ECN marked ACKs ( >> https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ). >> So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as >> Neal suggested) just before entering a timeout. Any suggestions here? >> >> In order to do a bit of profiling of the tcp_dctcp code I added >> support into tcp_probe for recording the dctcp alpha parameter. I see >> that alpha oscillates around about 0.1 when the flow rates have >> converged, it goes to zero when the other host enters a timeout, and I >> don't see any unexpected behavior just before the timeout ( >> https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ). >> >> So I haven't had much luck yet trying to track down where the problem >> is. If you have any suggestions that would help me to focus my search >> efforts, I would appreciate the comments. >> >> Thanks! >> -Steve
Steve, what HZ value your kernel is compiled with ?