On Wed, Jun 3, 2020 at 9:55 AM Eric Dumazet <eduma...@google.com> wrote: > > On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell <ncardw...@google.com> wrote: > > > > On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet <eduma...@google.com> wrote: > > > > > > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing <kerneljasonx...@gmail.com> > > > wrote: > > > > > > > > Hi Eric, > > > > > > > > I'm still trying to understand what you're saying before. Would this > > > > be better as following: > > > > 1) discard the tcp_internal_pacing() function. > > > > 2) remove where the tcp_internal_pacing() is called in the > > > > __tcp_transmit_skb() function. > > > > > > > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile, > > > > should we introduce the tcp_wstamp_ns socket field as commit > > > > (864e5c090749) does? > > > > > > > > > > Please do not top-post on netdev mailing list. > > > > > > > > > I basically suggested double-checking which point in TCP could end up > > > calling tcp_internal_pacing() > > > while the timer was already armed. > > > > > > I guess this is mtu probing. > > > > Perhaps this could also happen from some of the retransmission code > > paths that don't use tcp_xmit_retransmit_queue()? Perhaps > > tcp_retransmit_timer() (RTO) and tcp_send_loss_probe() TLP? It seems > > they could indirectly cause a call to __tcp_transmit_skb() and thus > > tcp_internal_pacing() without first checking if the pacing timer was > > already armed? > > I feared this, (see recent commits about very low pacing rates) :/ > > I am not sure we need to properly fix all these points for old > kernels, since EDT model got rid of these problems.
Agreed. > Maybe we can try to extend the timer. Sounds good. > Something like : > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index > cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer) > > static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb) > { > + struct tcp_sock *tp = tcp_sk(sk); > + ktime_t expire, now; > u64 len_ns; > u32 rate; > > @@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk, > const struct sk_buff *skb) > > len_ns = (u64)skb->len * NSEC_PER_SEC; > do_div(len_ns, rate); > - hrtimer_start(&tcp_sk(sk)->pacing_timer, > - ktime_add_ns(ktime_get(), len_ns), > + > + now = ktime_get(); > + /* If hrtimer is already armed, then our caller has not > + * used tcp_pacing_check(). > + */ > + if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) { > + expire = hrtimer_get_softexpires(&tp->pacing_timer); > + if (ktime_after(expire, now)) > + now = expire; > + if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1) > + __sock_put(sk); > + } > + hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns), > HRTIMER_MODE_ABS_PINNED_SOFT); > sock_hold(sk); > } > > +static bool tcp_pacing_check(const struct sock *sk) > +{ > + return tcp_needs_internal_pacing(sk) && > + hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); > +} > + > static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff > *skb) > { > skb->skb_mstamp = tp->tcp_mstamp; > @@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk) > if (!tcp_can_coalesce_send_queue_head(sk, probe_size)) > return -1; > > + if (tcp_pacing_check(sk)) > + return -1; > + > /* We're allowed to probe. Build it now. */ > nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false); > if (!nskb) > @@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk) > return -1; > } > > -static bool tcp_pacing_check(const struct sock *sk) > -{ > - return tcp_needs_internal_pacing(sk) && > - hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); > -} > > /* TCP Small Queues : > * Control number of packets in qdisc/devices to two packets / or ~1 ms. Thanks for your fix, Eric. This fix looks good to me! I agree that this fix is good enough for older kernels. thanks, neal