Dear ECN experts,
I want to report some oddities I encounter with downloading data on steam's
network. (My kids started playing steam games recently, so my network started
to see steam loads). Some games apparently need routine updates every few weeks
in the multi GB range which on the one hand seems quite excessive to me, but on
the other hand it presents a nice way to look closer at how modern CDN-backed
downloads operate. All of this is flowing trough my OpenWrt-based router using
cake traffic-shaper/scheduler/AQM combos in both directions. (Cake by default
uses rfc3168 ECN signaling, for packets presenting either ECT(0) or ECT(1), but
concurrently employs a BLUE to increase pure drop probability to eventually
also reign in unresponsive flows).
So far I have seen two things worth noting:
A) excessive CE marking (this happened with a multi-GB download served from
cloudflare CDN nodes) on the range of 70% of packets marked; Jonathan gave a
reasonable explanation that this might be BBRv1 in action. Side-note: am I the
only one slightly miffed that BBR apparently ignored to implement either an
appropriate CE response or to make sure BBR using TCPs refrain from negotiating
ECN? Unfortunately I did not take packet captures of this event (only tc -s
qdisc snapshots before and after). This I posted earlier on the list already.
B) Excessive ECT(1) marking (this happened with a multi-GB download)
Here is cctrace output for the respective flow (cctrace was developed as part
of SCE and the reported SCE events are equivalent with ECT(1))
55827-443:
Up: SCE=0, CE=0, ECE=79538, CWR=0, NS=0, total=104222
Down: SCE=200260, CE=0, ECE=0, CWR=2691, NS=0, total=200429
Here is what wireshark reported for the same TCP flow
{
"Address A": "2a01:c22:8c6c:8700:b84b:c89e:6424:8c74",
"Address B": "2a01:bc80:7:100::9b85:f813",
"Bits/s A → B": "289152",
"Bits/s B → A": "9205750",
"Bytes": "305873523",
"Bytes A → B": "9314897",
"Bytes B → A": "296558626",
"Duration": "257.715992",
"Packets": "304651",
"Packets A → B": "104222",
"Packets B → A": "200429",
"Percent Filtered": "0",
"Port A": "55827",
"Port B": "443",
"Rel Start": "314.512067",
"Stream ID": "259",
"Total Packets": "0"
},
So all in all roughly 5% of the total download volume, but essentially all
ECT(1).
Here I took a packet capture (albeit upstream of my ingress shaper so I see no
CE markings or dropped packets), but I failed to take the tc -s qdisc snapshots
to get easy access to number of CE marks and drops (these would not be per flow
anyway).
123-1234567:CAKE-autorate user$ sudo mtr -ezb6w -c 100
2a01:bc80:7:100::9b85:f813
Password:
Start: 2023-09-03T19:53:45+0200
HOST: 123-1234567.local
Loss% Snt Last Avg Best Wrst StDev
1. AS6805
dynamic-2a01-0c23-9012-7900-0000-0000-0000-0001.c23.pool.telefonica.de
(2a01:c23:9012:7900::1) 1.0% 100 1.1 1.2 0.8 2.7 0.3
2. AS6805 2a02:3001::11e
0.0% 100 15.9 46.2 12.9 149.6 26.4
3. AS6805 2a02:3001::1b7
56.0% 100 13.7 12.7 11.4 13.9 0.6
4. AS6805 2a02:3040:0:10::1c
46.0% 100 12.2 12.8 11.7 15.1 0.6
5. AS??? ???
100.0 100 0.0 0.0 0.0 0.0 0.0
6. AS??? ???
100.0 100 0.0 0.0 0.0 0.0 0.0
7. AS??? amsix-v6.valve.net (2001:7f8:1::a503:2590:1)
0.0% 100 28.6 31.7 26.0 116.4 18.1
8. AS32590 2a01:bc80:7:ffff::9b85:f8fb
0.0% 100 28.7 29.0 27.5 32.6 0.8
9. AS32590 2a01:bc80:7:100::9b85:f813
0.0% 100 26.9 26.5 24.7 31.8 1.0
At an average 9.2 Mbps, I saw one CWR for every 79538/2691 = 29.5 ECEs. With
~9Mbps this would be around 1000*(29.5*1534*8)/(9.2 * 1000^2) = 39.4 ms time
between sending the first ECE before seeing a CWR. This seems within range of
the 24.7ms unloaded RTT to that IPv6. (Note when I first reported this I mtr's
against the wrong server IPv6 and reported ~10ms RTT, but looking at the actual
TCP stream and mtr-ing that today I got the above.)
The CWR response to the ECEs indicates to me, that the flow was responsive to
ECN signaling (how well I can not say, as I did not capture CE marks nor packet
drops) and the network stayed responsive during the download, so to me this
looks like ECT(1) was used for a genuine ECN-enabled and CE-responsive flow.
Would be interesting if others could check whether they see ECT(1) in actual
use on their homelinks? Initially I thought this might be my ISP doing some
mismarking, but it would be quite lucky to re-mark a genuine ECT(0) flow to TOS
0x01 and hence ECT(1) so that ECN signaling would still work, while leaving the
other ~8 flows from the same remote address at ECT(0) (also with working ECN).
Just in case my ISP is AS6805 Telefónica Germany GmbH & Co. OHG.
For quick monitoring on my OpenWrt router, I use the following tcpdump
invocation just to see what is happening (pppoe-wan is my wan interface's name):
tcpdump -i pppoe-wan -v -n '(ip6 and (ip6[0:2] & 0x30) >> 4 == 1)' or '(ip and
(ip[1] & 0x3) == 1)' # ECT(1)
(to exercise/test this I use (under macos):
ping -z 0x01 -c 10 one.one.one.one
or
ping6 -z 0x01 -c 10 one.one.one.one)
and to see ECN activity in TCP I use:
tcpdump -i pppoe-wan -v -n '(tcp[tcpflags] & (tcp-ece|tcp-cwr) != 0)' or
'((ip6[6] = 6) and (ip6[53] & 0xC0 != 0))' # TCP ECN flags, ECN in action
Regards
Sebastian
_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat