Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
On Fri, 2019-02-15 at 09:05 +0100, Thomas Haller via networkmanager- list wrote: > > Hi, Hi Thomas, > Best way is: > > - first disable ratelimiting in journald. For that, set > RateLimitIntervalSec=0 in /etc/systemd/journald.conf and > `systemctl restart systemd-journald.service` > > - enable level=TRACE in /etc/NetworkManager/NetworkManager.conf > (level=DEBUG would be enough for these particular loglines, > but as you go throgh the effort of collecting logs, there is no > need > to restrict what to collect). > > - restart NM, reproduce, and provide logs. I sent this to you offline a few weeks ago and then a followup message a week or two after that but didn't get any response. I'm wondering if you perhaps didn't get those messages and if there is another way I should send those to you, given that they are a bit largish for the list. Cheers, b. signature.asc Description: This is a digitally signed message part ___ networkmanager-list mailing list networkmanager-list@gnome.org https://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
On Thu, 2019-02-14 at 10:29 -0500, Brian J. Murrell wrote: > On Thu, 2019-02-14 at 15:16 +0100, Thomas Haller via networkmanager- > list wrote: > > Hi, > > Hi. > > > From the logs it looks like to me the IPv6 address of the sender is > > indeed the instance managed by NetworkManager. > > Oh yes, I am sure it's the NM machine that is sending the RSes. > > > Is the issue hard to reproduce? > > Not hard to reproduce but I don't have a "lab" to reproduce it in and > so have to do it on my production network and doing so brings the > router/switch that everything is connected to to it's knees so it's > pretty catastrophic to reproduce. > > I'm also concerned with the disk load that disabling the rate > limiting > of journald is going to cause on the NM machine given that it's > suppressing thousands of messages at a time. > > But I will give it another go. > > Just so that I only have to do this once and as briefly as possible, > what level of debugging do I need to set to get the messages logged > by: > > https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n711 Hi, Best way is: - first disable ratelimiting in journald. For that, set RateLimitIntervalSec=0 in /etc/systemd/journald.conf and `systemctl restart systemd-journald.service` - enable level=TRACE in /etc/NetworkManager/NetworkManager.conf (level=DEBUG would be enough for these particular loglines, but as you go throgh the effort of collecting logs, there is no need to restrict what to collect). - restart NM, reproduce, and provide logs. In particular, loglines with the word "solicitation" seem relevant. Thanks, Thomas signature.asc Description: This is a digitally signed message part ___ networkmanager-list mailing list networkmanager-list@gnome.org https://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
On Thu, 2019-02-14 at 08:25 -0500, Brian J. Murrell wrote: > On Thu, 2019-02-14 at 13:46 +0100, Thomas Haller wrote: > > Hi, > > Hi, > > > In 1.10.2-16.el7_5, that code looks pretty similar. NetworkManager > > should either log > > > > "router solicitation sent" > > > > or > > > > "failure sending router solicitation: ... > > > > for every RS that gets send. > > > > > > Possibly the logging message was suppressed by journald's > > ratelimiting > > Most likely. > > > (as the journal would tell you, if that's the case). Make sure to > > disable rate-limiting, see [1]. > > > > > > [1] > > https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 > > Is it really necessary to actually see the messages in the journal at > this point? Doesn't the packet trace and CPU load of NM from my > previous message sufficiently prove that NM is storming RSes? > Hi, well, NetworkManager uses libndp for IPv6 neighbor discovery. From the logs it looks like to me the IPv6 address of the sender is indeed the instance managed by NetworkManager. However, 1) libndp never sends RS without being told to do so (I think). 2) NetworkManager sends RS (via libndp) only at one place. It's not clear to me how that code could be wrong so that it might result in a flood of RS. The logfile should show whether this code is culprit or not. Is the issue hard to reproduce? Your efforts are appreciated!! :) best, Thomas signature.asc Description: This is a digitally signed message part ___ networkmanager-list mailing list networkmanager-list@gnome.org https://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
On Thu, 2019-02-14 at 13:46 +0100, Thomas Haller wrote: > Hi, Hi, > In 1.10.2-16.el7_5, that code looks pretty similar. NetworkManager > should either log > > "router solicitation sent" > > or > > "failure sending router solicitation: ... > > for every RS that gets send. > > > Possibly the logging message was suppressed by journald's > ratelimiting Most likely. > (as the journal would tell you, if that's the case). Make sure to > disable rate-limiting, see [1]. > > > [1] > https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 Is it really necessary to actually see the messages in the journal at this point? Doesn't the packet trace and CPU load of NM from my previous message sufficiently prove that NM is storming RSes? Cheers, b. signature.asc Description: This is a digitally signed message part ___ networkmanager-list mailing list networkmanager-list@gnome.org https://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
Hi, On Thu, 2019-02-14 at 07:01 -0500, Brian J. Murrell wrote: > On Thu, 2019-02-14 at 07:49 +0100, Thomas Haller via networkmanager- > > > It's not clear to me what you mean. > > > > Do you mean, that the many RA received at NetworkManager cause > > NetworkManager to send too frequent RS (in turn, repeating a > > vicious > > cycle)? > > That's exactly how I understand the description from the author of > the > software on the router that is sending the RAs, yes: > > https://github.com/openwrt/odhcpd/issues/122#issuecomment-459263681 OK, makes sense. > > Also, NM only sends RS based on timeouts, not based on RAs that it > > receives [1]. > > Assuming no bugs, of course. Such as could the receiving of the > multiple RAs perhaps be causing the timeout to get reset, etc.? > > How/when does the logging in send_rs_timeout: > > https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n711 > > supposed to get logged? In 1.10.2-16.el7_5, that code looks pretty similar. NetworkManager should either log "router solicitation sent" or "failure sending router solicitation: ... for every RS that gets send. Possibly the logging message was suppressed by journald's ratelimiting (as the journal would tell you, if that's the case). Make sure to disable rate-limiting, see [1]. [1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28 signature.asc Description: This is a digitally signed message part ___ networkmanager-list mailing list networkmanager-list@gnome.org https://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
On Thu, 2019-02-14 at 07:49 +0100, Thomas Haller via networkmanager- list wrote: > > Hi Brian, Hi Thomas, > It's not clear to me what you mean. > > Do you mean, that the many RA received at NetworkManager cause > NetworkManager to send too frequent RS (in turn, repeating a vicious > cycle)? That's exactly how I understand the description from the author of the software on the router that is sending the RAs, yes: https://github.com/openwrt/odhcpd/issues/122#issuecomment-459263681 > Also, NM only sends RS based on timeouts, not based on RAs that it > receives [1]. Assuming no bugs, of course. Such as could the receiving of the multiple RAs perhaps be causing the timeout to get reset, etc.? How/when does the logging in send_rs_timeout: https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n711 supposed to get logged? > can you clarify what you think is the harmful behavior? It causes an RA/RS storm on the network segment which drives the load up on the router as it tries to process all of the RSes. I also see other devices in my network go flakey when the storm starts. > Also, does > tcpdump/wireshark show that NM sends too many RS? Well, the log from the router: https://github.com/openwrt/odhcpd/files/2802914/logread.txt shows it according to the author of that software, but I can grab a packet trace if you like: 2266 8.103505030 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2267 8.110988364 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2268 8.111476269 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2269 8.116601719 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2270 8.116998692 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2271 8.122950355 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2272 8.125336593 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2273 8.133207122 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2274 8.133783515 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2275 8.141006274 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2276 8.144105304 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2277 8.148667072 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2278 8.154763444 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2279 8.154935042 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2280 8.157726075 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2281 8.160963459 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2282 8.163637161 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2283 8.183327416 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2284 8.183769855 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2285 8.189963026 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2286 8.190426487 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2287 8.195982154 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2288 8.196411883 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2289 8.201634132 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2290 8.202029638 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2291 8.207328153 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2292 8.207783792 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2293 8.213471947 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from 6c:b0:ce:f5:1e:4a 2294 8.213864031 fe80::21f:c6ff:fec4:926a -> ff02::2 ICMPv6 70 Router Solicitation from 00:1f:c6:c4:92:6a 2295 8.219000726 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 174 Router Advertisement from
Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes
On Thu, 2019-01-31 at 06:46 -0500, Brian J. Murrell wrote: > On Thu, 2019-01-24 at 17:15 -0500, Brian J. Murrell wrote: > > On Tue, 2019-01-22 at 16:25 -0500, Brian J. Murrell wrote: > > > Seems I already had level=TRACE set. > > > > Was that debug useful? > > So, it seems this behaviour is more harmful to the network than just > spamming a log on the NM host. > > It's causing a message storm with the router that is sending the > multiple RAs in that NM is sending RS messages in response to the RAs > crushing the router with RS messages. > > Given that, can any more importance be put on this issue? > Hi Brian, It's not clear to me what you mean. Do you mean, that the many RA received at NetworkManager cause NetworkManager to send too frequent RS (in turn, repeating a vicious cycle)? I wouldn't see that from the log. There are no log lines about additional router solicitations sent by NetworkManager. Also, NM only sends RS based on timeouts, not based on RAs that it receives [1]. can you clarify what you think is the harmful behavior? Also, does tcpdump/wireshark show that NM sends too many RS? [1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n698 best, Thomas signature.asc Description: This is a digitally signed message part ___ networkmanager-list mailing list networkmanager-list@gnome.org https://mail.gnome.org/mailman/listinfo/networkmanager-list