On Thu, May 17, 2018 at 8:12 AM Eric Dumazet <eduma...@google.com> wrote:

> When TCP receives an out-of-order packet, it immediately sends
> a SACK packet, generating network load but also forcing the
> receiver to send 1-MSS pathological packets, increasing its
> RTX queue length/depth, and thus processing time.

> Wifi networks suffer from this aggressive behavior, but generally
> speaking, all these SACK packets add fuel to the fire when networks
> are under congestion.

> This patch adds a high resolution timer and tp->compressed_ack counter.

> Instead of sending a SACK, we program this timer with a small delay,
> based on SRTT and capped to 2.5 ms : delay = min ( 5 % of SRTT, 2.5 ms)
...

Very nice. Thanks for implementing this, Eric! I was wondering if the
constants here might be worth some discussion/elaboration.

> @@ -5074,6 +5076,7 @@ static inline void tcp_data_snd_check(struct sock
*sk)
>   static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
>   {
>          struct tcp_sock *tp = tcp_sk(sk);
> +       unsigned long delay;

>              /* More than one full frame received... */
>          if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss
&&
> @@ -5085,15 +5088,31 @@ static void __tcp_ack_snd_check(struct sock *sk,
int ofo_possible)
>              (tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
>               __tcp_select_window(sk) >= tp->rcv_wnd)) ||
>              /* We ACK each frame or... */
> -           tcp_in_quickack_mode(sk) ||
> -           /* We have out of order data. */
> -           (ofo_possible && !RB_EMPTY_ROOT(&tp->out_of_order_queue))) {
> -               /* Then ack it now */
> +           tcp_in_quickack_mode(sk)) {
> +send_now:
>                  tcp_send_ack(sk);
> -       } else {
> -               /* Else, send delayed ack. */
> +               return;
> +       }
> +
> +       if (!ofo_possible || RB_EMPTY_ROOT(&tp->out_of_order_queue)) {
>                  tcp_send_delayed_ack(sk);
> +               return;
>          }
> +
> +       if (!tcp_is_sack(tp) || tp->compressed_ack >= 127)
> +               goto send_now;
> +       tp->compressed_ack++;

Is there a particular motivation for the cap of 127? IMHO 127 ACKs is quite
a few to compress. Experience seems to show that it works well to have one
GRO ACK for ~64KBytes that triggers a single TSO skb of ~64KBytes. It might
be nice to try to match those dynamics in this SACK compression case, so it
might be nice to cap the number of compressed ACKs at something like 44?
(0xffff / 1448 - 1).  That way for high-speed paths we could try to keep
the ACK clock going with ACKs for ~64KBytes that trigger a single TSO skb
of ~64KBytes, no matter whether we are sending SACKs or cumulative ACKs.

> +
> +       if (hrtimer_is_queued(&tp->compressed_ack_timer))
> +               return;
> +
> +       /* compress ack timer : 5 % of srtt, but no more than 2.5 ms */
> +
> +       delay = min_t(unsigned long, 2500 * NSEC_PER_USEC,
> +                     tp->rcv_rtt_est.rtt_us * (NSEC_PER_USEC >> 3)/20);

Any particular motivation for the 2.5ms here? It might be nice to match the
existing TSO autosizing dynamics and use 1ms here instead of having a
separate new constant of 2.5ms. Smaller time scales here should lead to
less burstiness and queue pressure from data packets in the network, and we
know from experience that the CPU overhead of 1ms chunks is acceptable.

thanks,
neal

Reply via email to