As an update on this, I now suspect a problem with either the Ethernet hardware 
or (more likely) sky2 driver on ‘mbp’, my 2007 MBP that acts as Flent server 
and where I’m often using a qdisc. I should have looked at dmesg earlier, as 
there are log entries like this:

-----
[  221.478753] eth0: hw csum failure
[  221.478756] CPU: 1 PID: 1890 Comm: netserver Tainted: G        W       
4.8.0-37-generic #39-Ubuntu
[  221.478757] Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS    
MBP41.88Z.00C1.B03.0802271651 02/27/08
[  221.478762]  0000000000000286 000000003844a735 ffff9c293fd03ba8 
ffffffffb5c30e12
[  221.478765]  ffff9c293a505000 ffffffffb66fb5c0 ffff9c293fd03bc0 
ffffffffb5f7c028
[  221.478769]  ffff9c29399ea800 ffff9c293fd03be0 ffffffffb5f71f26 
af75267500000000
[  221.478770] Call Trace:
[  221.478775]  <IRQ>  [<ffffffffb5c30e12>] dump_stack+0x63/0x81
[  221.478778]  [<ffffffffb5f7c028>] netdev_rx_csum_fault+0x38/0x40
[  221.478781]  [<ffffffffb5f71f26>] __skb_checksum_complete+0xb6/0xc0
…
[  226.478373] net_ratelimit: 386 callbacks suppressed
[  226.478378] eth0: hw csum failure
[  226.479523] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       
4.8.0-37-generic #39-Ubuntu
[  226.479527] Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS    
MBP41.88Z.00C1.B03.0802271651 02/27/08
[  226.479533]  0000000000000286 f78e43dca42a09d0 ffff9c293fd03b88 
ffffffffb5c30e12
[  226.479542]  ffff9c293a505000 ffffffffb66fb5c0 ffff9c293fd03ba0 
ffffffffb5f7c028
[  226.479549]  ffff9c2932093b00 ffff9c293fd03bc0 ffffffffb5f71f26 
46898f6100000000
[  226.479557] Call Trace:
[  226.479560]  <IRQ>  [<ffffffffb5c30e12>] dump_stack+0x63/0x81
[  226.479581]  [<ffffffffb5f7c028>] netdev_rx_csum_fault+0x38/0x40
-----

What’s interesting is that they only occur during testing, and when QoS with 
rate limiting is applied (Cake or HTB+X also). It’s also interesting that they 
occur on exactly 5 second intervals, not every 5 seconds, but sometimes after 
10, or 15 seconds, but on 5 second intervals. I went back and looked at my 
results, and realized that a very large number of the latency and throughput 
shifts I saw are also quantized to 5 second intervals. I don’t think that’s a 
coincidence.

I saw Dave posted something that he saw a similar 'hw csum failure' on raspi 
earlier in 2016:

https://github.com/raspberrypi/linux/issues/1371 
<https://github.com/raspberrypi/linux/issues/1371>

but since I’ve also seen more reports of this over the years with no clear 
solution.

Why I saw it more with Cake than other qdiscs I don’t know, but I think it’s 
safe to say there’s no point in you trying to reproduce this until I can get 
past this with my hardware, and also I’m likely going to have to do a re-run of 
all of my tests after this is sorted out.

Pete

> On Feb 10, 2017, at 1:21 PM, Pete Heist <petehe...@gmail.com> wrote:
> 
> 
>> On Feb 10, 2017, at 12:35 PM, Sebastian Moeller <moell...@gmx.de 
>> <mailto:moell...@gmx.de>> wrote:
>> 
>> Hi Pete,
>> 
>>> On Feb 10, 2017, at 12:08, Pete Heist <petehe...@gmail.com 
>>> <mailto:petehe...@gmail.com>> wrote:
>>> 
>>> Not a problem. I’ll run a spread of Cake and fq_codel over Ethernet at 
>>> various bandwidths. It will be through their Apple USB Ethernet adapters 
>>> (used now for management), which are also connected through a switch, but I 
>>> think that setup should be fine for this purpose. Should be done in a hour 
>>> or so and we’ll see…
>> 
>>      I believe the Apple USB dongles are fastEthernet only, at least the 
>> USB2 types I have available here, which for your tested bandwidth would 
>> work, but it will not allow you test at what shaper rate things go pear 
>> shaped… Also it wifi creates a bit more CPU load than wired ethernet, it 
>> _might_ make sense to concurrently excercise the WIFI cards just to 
>> re-create the SIRQ load (but probably not as the first experiment ;) ).
>> 
>> Best Regards
>>      Sebastian 
> 
> Hi Sebastian, yes, they’re only 100 Mbit, but that’s enough to cover the 
> rates where I was seeing the problem with Wi-Fi. Also in my test setup there 
> are four nodes connected as described under Configuration #1:
> 
> http://www.drhleny.cz/bufferbloat/wifi_bufferbloat.html 
> <http://www.drhleny.cz/bufferbloat/wifi_bufferbloat.html>
> 
> I’m running Cake on ‘mini’ and ‘mbp’, and the Wi-Fi radios are only on ‘om1’ 
> and ‘om2’, so the CPU load shouldn’t be different for mini and mbp when 
> connected directly via Ethernet, instead of via Ethernet and a Wi-Fi link, I 
> suppose.
> 
> I think we just wanted to see if the throughput shifting would reproduce over 
> Ethernet at the same rates, but so far it didn’t for me, although there are 
> other anomalies that don’t look like the throughput shifts I sent before 
> (there’s a throughput anomaly for Cake 20Mbit and latency anomalies for 
> fq_codel 60Mbit and 90Mbit):
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_10mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_10mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_20mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_20mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_30mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_30mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_40mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_40mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_50mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_50mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_60mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_60mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_70mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_70mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_75mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_75mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_80mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_80mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_85mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_85mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_90mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_90mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_100mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/cake_hd-eth_100mbit/index.html>
> 
> fq_codel:
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_10mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_10mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_20mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_20mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_30mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_30mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_40mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_40mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_50mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_50mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_60mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_60mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_70mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_70mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_75mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_75mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_80mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_80mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_85mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_85mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_90mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_90mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_100mbit/index.html 
> <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_100mbit/index.html>
> 
> So that suggests that the throughput shifting problem may also be somehow 
> related to Wi-Fi. I’m still going to be testing Chaos Calmer, as well as two 
> Ubiquiti NanoStation M5’s, though this will take some more time. We might 
> learn some more from this, or if you can reproduce it with ath9k hardware 
> that would be good too...
> 
> Thanks,
> Pete
> 

_______________________________________________
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake

Reply via email to