[Bug 271205] [ix] [carp]: Continuous input errors on Intel X553
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271205 Mark Linimon changed: What|Removed |Added Keywords||IntelNetworking, regression Assignee|b...@freebsd.org|n...@freebsd.org -- You are receiving this mail because: You are the assignee for the bug.
Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
Second attempt, first one failed due to not being a member of the list :-(. > Adding freebsd-transp...@freebsd.org to get that specific groups > eyes on this issue. > > Rod > > > As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, > > FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension > > That is, during slow-start, when receiving an ACK of 'bytes_acked' > > > > cwnd += min(bytes_acked, abc_l_var * SMSS); // abc_l_var = 2 dflt > > > > As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances > > the negative impact of the delayed ACK algorithm. RFC 5681 also > > requires that a receiver SHOULD generate an ACK for at least every > > second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. > > If both sender and receiver follow it. cwnd should grow exponentially > > during slow-slow: > > > > cwnd *= 2(per RTT) > > > > However, LRO and TSO are widely used today, so receiver may generate > > much less ACKs than it used to do. As I observed, Both FreeBSD and > > Linux generates at most one ACK per segment assembled by LRO/GRO. > > The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535. > > > > Sending 1MB over a link of 100ms delay from FreeBSD 13.2: > > > > 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options > > [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 > > 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win > > 65160, options [mss 1460,sackOK,TS val 563185696 ecr > > 495212525,nop,wscale 7], length 0 > > 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS > > val 495212626 ecr 563185696], length 0 > > // TSopt omitted below for brevity. > > > > // cwnd = 10 * MSS, sent 10 * MSS > > 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 14480 > > > > // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS > > 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 > > 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length > > 17376 > > > > // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS > > 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 > > 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length > > 20272 > > > > // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS > > 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 > > 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, > > length 21500 > > 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length > > 1448 > > > > As a consequence, instead of growing exponentially, cwnd grows > > more-or-less quadratically during slow-start, unless abc_l_var is > > set to a sufficiently large value. > > > > NewReno took more than 20 seconds to ramp up throughput to 100Mbps > > over an emulated 100ms delay link. While Linux took ~2 seconds. > > I can provide the pcap file if anyone is interested. > > > > Switching to CUBIC won't help, because it uses the logic in NewReno > > ack_received() for slow start. > > > > Is this a well-known issue and abc_l_var is the only cure for it? > > https://calomel.org/freebsd_network_tuning.html > > > > Thank you! > > > > Best, > > Shuo Chen > > > > > > -- > Rod Grimes rgri...@freebsd.org > > -- Rod Grimes rgri...@freebsd.org
Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
Adding freebsd-transp...@freebsd.org to get that specific groups eyes on this issue. Rod > As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, > FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension > That is, during slow-start, when receiving an ACK of 'bytes_acked' > > cwnd += min(bytes_acked, abc_l_var * SMSS); // abc_l_var = 2 dflt > > As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances > the negative impact of the delayed ACK algorithm. RFC 5681 also > requires that a receiver SHOULD generate an ACK for at least every > second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. > If both sender and receiver follow it. cwnd should grow exponentially > during slow-slow: > > cwnd *= 2(per RTT) > > However, LRO and TSO are widely used today, so receiver may generate > much less ACKs than it used to do. As I observed, Both FreeBSD and > Linux generates at most one ACK per segment assembled by LRO/GRO. > The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535. > > Sending 1MB over a link of 100ms delay from FreeBSD 13.2: > > 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options > [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 > 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win > 65160, options [mss 1460,sackOK,TS val 563185696 ecr > 495212525,nop,wscale 7], length 0 > 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS > val 495212626 ecr 563185696], length 0 > // TSopt omitted below for brevity. > > // cwnd = 10 * MSS, sent 10 * MSS > 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 14480 > > // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS > 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 > 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length > 17376 > > // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS > 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 > 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length > 20272 > > // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS > 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 > 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, > length 21500 > 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length > 1448 > > As a consequence, instead of growing exponentially, cwnd grows > more-or-less quadratically during slow-start, unless abc_l_var is > set to a sufficiently large value. > > NewReno took more than 20 seconds to ramp up throughput to 100Mbps > over an emulated 100ms delay link. While Linux took ~2 seconds. > I can provide the pcap file if anyone is interested. > > Switching to CUBIC won't help, because it uses the logic in NewReno > ack_received() for slow start. > > Is this a well-known issue and abc_l_var is the only cure for it? > https://calomel.org/freebsd_network_tuning.html > > Thank you! > > Best, > Shuo Chen > > -- Rod Grimes rgri...@freebsd.org
Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
On 5/2/23 11:14, Hans Petter Selasky wrote: Hi Chen! The FreeBSD mbufs carry the number of ACKs that have been joined together into the following field: m->m_pkthdr.lro_nsegs Can this value be of any use to cc_newreno ? --HPS Hi Chen, Have you tested using FreeBSD main / 14 ? The "nsegs" are passed along like this: nsegs = max(1, m->m_pkthdr.lro_nsegs); ... cc_ack_received(tp, th, nsegs, CC_ACK); ... (Newreno - FreeBSD-14) incr = min(ccv->bytes_this_ack, ccv->nsegs * abc_val * CCV(ccv, t_maxseg)); And in FreeBSD-10 being mentioned in your article: (Newreno - FreeBSD-10) incr = min(ccv->bytes_this_ack, V_tcp_abc_l_var * CCV(ccv, t_maxseg)); There is no such thing. This issue may already have been fixed! --HPS On 5/2/23 09:46, Chen Shuo wrote: As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension That is, during slow-start, when receiving an ACK of 'bytes_acked' cwnd += min(bytes_acked, abc_l_var * SMSS); // abc_l_var = 2 dflt As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances the negative impact of the delayed ACK algorithm. RFC 5681 also requires that a receiver SHOULD generate an ACK for at least every second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. If both sender and receiver follow it. cwnd should grow exponentially during slow-slow: cwnd *= 2 (per RTT) However, LRO and TSO are widely used today, so receiver may generate much less ACKs than it used to do. As I observed, Both FreeBSD and Linux generates at most one ACK per segment assembled by LRO/GRO. The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535. Sending 1MB over a link of 100ms delay from FreeBSD 13.2: 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win 65160, options [mss 1460,sackOK,TS val 563185696 ecr 495212525,nop,wscale 7], length 0 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS val 495212626 ecr 563185696], length 0 // TSopt omitted below for brevity. // cwnd = 10 * MSS, sent 10 * MSS 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 14480 // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length 17376 // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length 20272 // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, length 21500 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length 1448 As a consequence, instead of growing exponentially, cwnd grows more-or-less quadratically during slow-start, unless abc_l_var is set to a sufficiently large value. NewReno took more than 20 seconds to ramp up throughput to 100Mbps over an emulated 100ms delay link. While Linux took ~2 seconds. I can provide the pcap file if anyone is interested. Switching to CUBIC won't help, because it uses the logic in NewReno ack_received() for slow start. Is this a well-known issue and abc_l_var is the only cure for it? https://calomel.org/freebsd_network_tuning.html Thank you! Best, Shuo Chen
Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
Hi Chen! The FreeBSD mbufs carry the number of ACKs that have been joined together into the following field: m->m_pkthdr.lro_nsegs Can this value be of any use to cc_newreno ? --HPS On 5/2/23 09:46, Chen Shuo wrote: As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension That is, during slow-start, when receiving an ACK of 'bytes_acked' cwnd += min(bytes_acked, abc_l_var * SMSS); // abc_l_var = 2 dflt As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances the negative impact of the delayed ACK algorithm. RFC 5681 also requires that a receiver SHOULD generate an ACK for at least every second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. If both sender and receiver follow it. cwnd should grow exponentially during slow-slow: cwnd *= 2(per RTT) However, LRO and TSO are widely used today, so receiver may generate much less ACKs than it used to do. As I observed, Both FreeBSD and Linux generates at most one ACK per segment assembled by LRO/GRO. The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535. Sending 1MB over a link of 100ms delay from FreeBSD 13.2: 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win 65160, options [mss 1460,sackOK,TS val 563185696 ecr 495212525,nop,wscale 7], length 0 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS val 495212626 ecr 563185696], length 0 // TSopt omitted below for brevity. // cwnd = 10 * MSS, sent 10 * MSS 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 14480 // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length 17376 // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length 20272 // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, length 21500 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length 1448 As a consequence, instead of growing exponentially, cwnd grows more-or-less quadratically during slow-start, unless abc_l_var is set to a sufficiently large value. NewReno took more than 20 seconds to ramp up throughput to 100Mbps over an emulated 100ms delay link. While Linux took ~2 seconds. I can provide the pcap file if anyone is interested. Switching to CUBIC won't help, because it uses the logic in NewReno ack_received() for slow start. Is this a well-known issue and abc_l_var is the only cure for it? https://calomel.org/freebsd_network_tuning.html Thank you! Best, Shuo Chen
Cwnd grows slowly during slow-start due to LRO of the receiver side.
As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension That is, during slow-start, when receiving an ACK of 'bytes_acked' cwnd += min(bytes_acked, abc_l_var * SMSS); // abc_l_var = 2 dflt As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances the negative impact of the delayed ACK algorithm. RFC 5681 also requires that a receiver SHOULD generate an ACK for at least every second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. If both sender and receiver follow it. cwnd should grow exponentially during slow-slow: cwnd *= 2(per RTT) However, LRO and TSO are widely used today, so receiver may generate much less ACKs than it used to do. As I observed, Both FreeBSD and Linux generates at most one ACK per segment assembled by LRO/GRO. The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535. Sending 1MB over a link of 100ms delay from FreeBSD 13.2: 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win 65160, options [mss 1460,sackOK,TS val 563185696 ecr 495212525,nop,wscale 7], length 0 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS val 495212626 ecr 563185696], length 0 // TSopt omitted below for brevity. // cwnd = 10 * MSS, sent 10 * MSS 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 14480 // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length 17376 // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length 20272 // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, length 21500 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length 1448 As a consequence, instead of growing exponentially, cwnd grows more-or-less quadratically during slow-start, unless abc_l_var is set to a sufficiently large value. NewReno took more than 20 seconds to ramp up throughput to 100Mbps over an emulated 100ms delay link. While Linux took ~2 seconds. I can provide the pcap file if anyone is interested. Switching to CUBIC won't help, because it uses the logic in NewReno ack_received() for slow start. Is this a well-known issue and abc_l_var is the only cure for it? https://calomel.org/freebsd_network_tuning.html Thank you! Best, Shuo Chen