Re: [Bloat] BBR implementations, knobs to turn?

2020-11-17 Thread Jesper Dangaard Brouer
On Tue, 17 Nov 2020 10:05:24 +  wrote:

> Thank you for the response Neal

Yes. And it is impressive how many highly qualified people are on the
bufferbloat list.

> old_hw # uname -r
> 5.3.0-64-generic
> (Ubuntu 19.10 on xenon workstation, integrated network card, 1Gbit
> GPON access.  Used as proof of concept from the lab at work)
>  
> 
> new_hw # uname -r
> 4.18.0-193.19.1.el8_2.x86_64
> (Centos 8.2 on xenon rack server, discrete 10Gbit network card,
> 40Gbit server farm link (low utilization on link), intended as fully
> supported and run service.  Not possible to have newer kernel and
> still get service agreement in my organization)

Let me help out here.  The CentOS/RHEL8 kernels have a huge amount of
backports.  I've attached a patch/diff of net/ipv4/tcp_bbr.c changes
missing in RHEL8.

It looks like these patches are missing in CentOS/RHEL8:
 [1] https://git.kernel.org/torvalds/c/78dc70ebaa38aa3
 [2] https://git.kernel.org/torvalds/c/a87c83d5ee25cf7

Could missing patch [1] result in the issue Erik is seeing?
(It explicitly mentions improvements for WiFi...)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
--- /home/hawk/git/redhat/kernel-rhel8/net/ipv4/tcp_bbr.c	2020-01-30 17:38:20.832726582 +0100
+++ /home/hawk/git/kernel/net-next/net/ipv4/tcp_bbr.c	2020-11-17 15:38:22.665729797 +0100
@@ -115,6 +115,14 @@ struct bbr {
 		unused_b:5;
 	u32	prior_cwnd;	/* prior cwnd upon entering loss recovery */
 	u32	full_bw;	/* recent bw, to estimate if pipe is full */
+
+	/* For tracking ACK aggregation: */
+	u64	ack_epoch_mstamp;	/* start of ACK sampling epoch */
+	u16	extra_acked[2];		/* max excess data ACKed in epoch */
+	u32	ack_epoch_acked:20,	/* packets (S)ACKed in sampling epoch */
+		extra_acked_win_rtts:5,	/* age of extra_acked, in round trips */
+		extra_acked_win_idx:1,	/* current index in extra_acked array */
+		unused_c:6;
 };
 
 #define CYCLE_LEN	8	/* number of phases in a pacing gain cycle */
@@ -128,6 +136,14 @@ static const u32 bbr_probe_rtt_mode_ms =
 /* Skip TSO below the following bandwidth (bits/sec): */
 static const int bbr_min_tso_rate = 120;
 
+/* Pace at ~1% below estimated bw, on average, to reduce queue at bottleneck.
+ * In order to help drive the network toward lower queues and low latency while
+ * maintaining high utilization, the average pacing rate aims to be slightly
+ * lower than the estimated bandwidth. This is an important aspect of the
+ * design.
+ */
+static const int bbr_pacing_margin_percent = 1;
+
 /* We use a high_gain value of 2/ln(2) because it's the smallest pacing gain
  * that will allow a smoothly increasing pacing rate that will double each RTT
  * and send the same number of packets per RTT that an un-paced, slow-starting
@@ -174,6 +190,15 @@ static const u32 bbr_lt_bw_diff = 4000 /
 /* If we estimate we're policed, use lt_bw for this many round trips: */
 static const u32 bbr_lt_bw_max_rtts = 48;
 
+/* Gain factor for adding extra_acked to target cwnd: */
+static const int bbr_extra_acked_gain = BBR_UNIT;
+/* Window length of extra_acked window. */
+static const u32 bbr_extra_acked_win_rtts = 5;
+/* Max allowed val for ack_epoch_acked, after which sampling epoch is reset */
+static const u32 bbr_ack_epoch_acked_reset_thresh = 1U << 20;
+/* Time period for clamping cwnd increment due to ack aggregation */
+static const u32 bbr_extra_acked_max_us = 100 * 1000;
+
 static void bbr_check_probe_rtt_done(struct sock *sk);
 
 /* Do we estimate that STARTUP filled the pipe? */
@@ -200,21 +225,33 @@ static u32 bbr_bw(const struct sock *sk)
 	return bbr->lt_use_bw ? bbr->lt_bw : bbr_max_bw(sk);
 }
 
+/* Return maximum extra acked in past k-2k round trips,
+ * where k = bbr_extra_acked_win_rtts.
+ */
+static u16 bbr_extra_acked(const struct sock *sk)
+{
+	struct bbr *bbr = inet_csk_ca(sk);
+
+	return max(bbr->extra_acked[0], bbr->extra_acked[1]);
+}
+
 /* Return rate in bytes per second, optionally with a gain.
  * The order here is chosen carefully to avoid overflow of u64. This should
  * work for input rates of up to 2.9Tbit/sec and gain of 2.89x.
  */
 static u64 bbr_rate_bytes_per_sec(struct sock *sk, u64 rate, int gain)
 {
-	rate *= tcp_mss_to_mtu(sk, tcp_sk(sk)->mss_cache);
+	unsigned int mss = tcp_sk(sk)->mss_cache;
+
+	rate *= mss;
 	rate *= gain;
 	rate >>= BBR_SCALE;
-	rate *= USEC_PER_SEC;
+	rate *= USEC_PER_SEC / 100 * (100 - bbr_pacing_margin_percent);
 	return rate >> BW_SCALE;
 }
 
 /* Convert a BBR bw and gain factor to a pacing rate in bytes per second. */
-static u32 bbr_bw_to_pacing_rate(struct sock *sk, u32 bw, int gain)
+static unsigned long bbr_bw_to_pacing_rate(struct sock *sk, u32 bw, int gain)
 {
 	u64 rate = bw;
 
@@ -242,18 +279,12 @@ static void bbr_init_pacing_rate_from_rt
 	sk->sk_pacing_rate = bbr_bw_to_pacing_rate(sk, bw, bbr_high_gain);
 }
 
-/* Pace using current bw estimate and a gain factor. In order to help drive the
- * network toward

[Bloat] openwrt e1000e (was: Re: cake + ipv6)

2020-11-17 Thread Daniel Sterling
> Daniel Sterling  writes:
> In my case: I am happy to report this is *not* a bug or an issue with
> cake, as I originally thought. I am able to reproduce the issue I was

Wanted to give an update on this. All my issues (odd latency, slow
throughput, etc) went away when I switched from the in-kernel e1000e
driver to Intel's NAPI driver.

That is, I compiled e1000e driver 3.8.7-NAPI from
https://sourceforge.net/projects/e1000/files/e1000e%20stable/ ,
instead of using the mainline e1000e driver from kernel 5.4.75 in
openwrt dev.

After switching to the Intel driver, my internet has been rock solid.

My NIC from lspci:
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

This is the "Intel EXPI9301CT Desktop Adapter Gigabit" from newegg. I
bought it figuring it would have good linux support. *grin*

And it does, but not with the mainline driver it seems. The in-kernel
driver doesn't (I assume) support NAPI -- so very possibly this is due
to the NAPI support in the out-of-tree driver (vs being an issue with
the driver itself)

-- Dan
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat