As an update on this, I now suspect a problem with either the Ethernet hardware or (more likely) sky2 driver on ‘mbp’, my 2007 MBP that acts as Flent server and where I’m often using a qdisc. I should have looked at dmesg earlier, as there are log entries like this:
----- [ 221.478753] eth0: hw csum failure [ 221.478756] CPU: 1 PID: 1890 Comm: netserver Tainted: G W 4.8.0-37-generic #39-Ubuntu [ 221.478757] Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS MBP41.88Z.00C1.B03.0802271651 02/27/08 [ 221.478762] 0000000000000286 000000003844a735 ffff9c293fd03ba8 ffffffffb5c30e12 [ 221.478765] ffff9c293a505000 ffffffffb66fb5c0 ffff9c293fd03bc0 ffffffffb5f7c028 [ 221.478769] ffff9c29399ea800 ffff9c293fd03be0 ffffffffb5f71f26 af75267500000000 [ 221.478770] Call Trace: [ 221.478775] <IRQ> [<ffffffffb5c30e12>] dump_stack+0x63/0x81 [ 221.478778] [<ffffffffb5f7c028>] netdev_rx_csum_fault+0x38/0x40 [ 221.478781] [<ffffffffb5f71f26>] __skb_checksum_complete+0xb6/0xc0 … [ 226.478373] net_ratelimit: 386 callbacks suppressed [ 226.478378] eth0: hw csum failure [ 226.479523] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.8.0-37-generic #39-Ubuntu [ 226.479527] Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS MBP41.88Z.00C1.B03.0802271651 02/27/08 [ 226.479533] 0000000000000286 f78e43dca42a09d0 ffff9c293fd03b88 ffffffffb5c30e12 [ 226.479542] ffff9c293a505000 ffffffffb66fb5c0 ffff9c293fd03ba0 ffffffffb5f7c028 [ 226.479549] ffff9c2932093b00 ffff9c293fd03bc0 ffffffffb5f71f26 46898f6100000000 [ 226.479557] Call Trace: [ 226.479560] <IRQ> [<ffffffffb5c30e12>] dump_stack+0x63/0x81 [ 226.479581] [<ffffffffb5f7c028>] netdev_rx_csum_fault+0x38/0x40 ----- What’s interesting is that they only occur during testing, and when QoS with rate limiting is applied (Cake or HTB+X also). It’s also interesting that they occur on exactly 5 second intervals, not every 5 seconds, but sometimes after 10, or 15 seconds, but on 5 second intervals. I went back and looked at my results, and realized that a very large number of the latency and throughput shifts I saw are also quantized to 5 second intervals. I don’t think that’s a coincidence. I saw Dave posted something that he saw a similar 'hw csum failure' on raspi earlier in 2016: https://github.com/raspberrypi/linux/issues/1371 <https://github.com/raspberrypi/linux/issues/1371> but since I’ve also seen more reports of this over the years with no clear solution. Why I saw it more with Cake than other qdiscs I don’t know, but I think it’s safe to say there’s no point in you trying to reproduce this until I can get past this with my hardware, and also I’m likely going to have to do a re-run of all of my tests after this is sorted out. Pete > On Feb 10, 2017, at 1:21 PM, Pete Heist <petehe...@gmail.com> wrote: > > >> On Feb 10, 2017, at 12:35 PM, Sebastian Moeller <moell...@gmx.de >> <mailto:moell...@gmx.de>> wrote: >> >> Hi Pete, >> >>> On Feb 10, 2017, at 12:08, Pete Heist <petehe...@gmail.com >>> <mailto:petehe...@gmail.com>> wrote: >>> >>> Not a problem. I’ll run a spread of Cake and fq_codel over Ethernet at >>> various bandwidths. It will be through their Apple USB Ethernet adapters >>> (used now for management), which are also connected through a switch, but I >>> think that setup should be fine for this purpose. Should be done in a hour >>> or so and we’ll see… >> >> I believe the Apple USB dongles are fastEthernet only, at least the >> USB2 types I have available here, which for your tested bandwidth would >> work, but it will not allow you test at what shaper rate things go pear >> shaped… Also it wifi creates a bit more CPU load than wired ethernet, it >> _might_ make sense to concurrently excercise the WIFI cards just to >> re-create the SIRQ load (but probably not as the first experiment ;) ). >> >> Best Regards >> Sebastian > > Hi Sebastian, yes, they’re only 100 Mbit, but that’s enough to cover the > rates where I was seeing the problem with Wi-Fi. Also in my test setup there > are four nodes connected as described under Configuration #1: > > http://www.drhleny.cz/bufferbloat/wifi_bufferbloat.html > <http://www.drhleny.cz/bufferbloat/wifi_bufferbloat.html> > > I’m running Cake on ‘mini’ and ‘mbp’, and the Wi-Fi radios are only on ‘om1’ > and ‘om2’, so the CPU load shouldn’t be different for mini and mbp when > connected directly via Ethernet, instead of via Ethernet and a Wi-Fi link, I > suppose. > > I think we just wanted to see if the throughput shifting would reproduce over > Ethernet at the same rates, but so far it didn’t for me, although there are > other anomalies that don’t look like the throughput shifts I sent before > (there’s a throughput anomaly for Cake 20Mbit and latency anomalies for > fq_codel 60Mbit and 90Mbit): > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_10mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_10mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_20mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_20mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_30mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_30mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_40mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_40mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_50mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_50mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_60mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_60mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_70mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_70mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_75mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_75mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_80mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_80mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_85mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_85mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_90mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_90mbit/index.html> > > http://www.drhleny.cz/bufferbloat/cake_hd-eth_100mbit/index.html > <http://www.drhleny.cz/bufferbloat/cake_hd-eth_100mbit/index.html> > > fq_codel: > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_10mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_10mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_20mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_20mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_30mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_30mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_40mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_40mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_50mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_50mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_60mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_60mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_70mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_70mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_75mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_75mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_80mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_80mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_85mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_85mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_90mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_90mbit/index.html> > > http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_100mbit/index.html > <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_100mbit/index.html> > > So that suggests that the throughput shifting problem may also be somehow > related to Wi-Fi. I’m still going to be testing Chaos Calmer, as well as two > Ubiquiti NanoStation M5’s, though this will take some more time. We might > learn some more from this, or if you can reproduce it with ath9k hardware > that would be good too... > > Thanks, > Pete >
_______________________________________________ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake