Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein via Bloat Sat, 07 Nov 2020 04:37:45 -0800


On 6 Nov 2020, at 21:19, Jesper Dangaard Brouer wrote:

On Fri, 06 Nov 2020 18:04:49 +0100
"Thomas Rosenstein" <thomas.rosenst...@creamfinance.com> wrote:

On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote:


I'm using ping on IPv4, but I'll try to see if IPv6 makes any
difference!


I think you misunderstand me.  I'm not asking you to use ping6. The
gobgpd daemon updates will both update IPv4 and IPv6 routes, right.
Updating IPv6 routes are more problematic than IPv4 routes.  The IPv6
route tables update can potentially stall softirq from running, which
was the latency tool was measuring... and it did show some outliers.

yes I did, I assumed the latency would be introduced in the traffic pathby the lock.

Nonetheless, I tested it and no difference :)

Have you tried to use 'perf record' to observe that is happening on
the system while these latency incidents happen? (let me know ifyou
want some cmdline hints)


Haven't tried this yet. If you have some hints what events to monitor
I'll take them!


Okay to record everything (-a) on the system and save call-graph (-g),
and run for 5 seconds (via profiling the sleep function).

 # perf record -g -a  sleep 5

To view the result the simply use the 'perf report', but likely you
want to use option --no-children as you are profiling the kernel (and
not a userspace program you want to have grouped 'children' by).  I
also include the CPU column via '--sort cpu,comm,dso,symbol' and you
can select/zoom-in-on a specific CPU via '-C zero-indexed-cpu-num'.

 # perf report --sort cpu,comm,dso,symbol --no-children

When we ask you to provide the output, you can use the --stdio option,
and provide txt-info via a pastebin link as it is very long.

Here is the output from kernel 3.10_1127 (I updated to the really newestin that branch): https://pastebin.com/5mxirXPw

Here is the output from kernel 5.9.4: https://pastebin.com/KDZ2Ei2F

I have noticed that the delays are directly related to the trafficflows, see below.

These tests are WITHOUT gobgpd running, so no updates to the routetable, but the route tables are fully populated.Also, it's ONLY outgoing traffic, the return packets are coming in onanother router.

I have then cleared the routing tables, and the issue persists, tablehas only 78 entries.


40 threads -> sometimes higher rtt times: https://pastebin.com/Y9nd0h4h
60 threads -> always high rtt times: https://pastebin.com/JFvhtLrH

So it definitly gets worse the more connections there are.

I have also tried to reproduce the issue with the kernel on a virtualhyper-v machine, there I don't have any adverse effects.But it's not 100% the same, since MASQ happens on it .. will restructurea bit to get a similar representation

I also suspected now that -j NOTRACK would be an issue, removed thattoo, no change. (it's anyways async routing)


Additionally I have quit all applications except for sshd, no change!


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Reply via email to