On Fri, Jun 29, 2018 at 9:48 PM Lawrence Brakmo <bra...@fb.com> wrote:
>
> We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The
> problem is triggered when the last packet of a request arrives CE
> marked. The reply will carry the ECE mark causing TCP to shrink its cwnd
> to 1 (because there are no packets in flight). When the 1st packet of
> the next request arrives it was sometimes delayed adding up to 40ms to
> the latency.
>
> This patch insures that CWR makred data packets arriving will be acked
> immediately.
>
> Signed-off-by: Lawrence Brakmo <bra...@fb.com>
> ---

Thanks, Larry. Ensuring that CWR-marked data packets arriving will be
acked immediately sounds like a good goal to me.

I am wondering if we can have a simpler fix.

The dctcp_ce_state_0_to_1() code is setting the TCP_ECN_DEMAND_CWR
bit in ecn_flags, which disables the code in __tcp_ecn_check_ce() that
would have otherwise used the tcp_enter_quickack_mode() mechanism to
force an ACK:

__tcp_ecn_check_ce():
...
case INET_ECN_CE:
  if (tcp_ca_needs_ecn(sk))
    tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
       // -> dctcp_ce_state_0_to_1()
       //     ->  tp->ecn_flags |= TCP_ECN_DEMAND_CWR;

  if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
    /* Better not delay acks, sender can have a very low cwnd */
    tcp_enter_quickack_mode(sk, 1);
    tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
  }
  tp->ecn_flags |= TCP_ECN_SEEN;
  break;

So it seems like the bug here may be that dctcp_ce_state_0_to_1()  is
setting the TCP_ECN_DEMAND_CWR  bit in ecn_flags, when really it
should let its caller, __tcp_ecn_check_ce() set TCP_ECN_DEMAND_CWR, in
which case the code would already properly force an immediate ACK.

So, what if we use this fix instead (not compiled, not tested):

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index 5f5e5936760e..4fecd2824edb 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -152,8 +152,6 @@ static void dctcp_ce_state_0_to_1(struct sock *sk)

        ca->prior_rcv_nxt = tp->rcv_nxt;
        ca->ce_state = 1;
-
-       tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
 }

 static void dctcp_ce_state_1_to_0(struct sock *sk)

What do you think?

neal

Reply via email to