Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-03-25 Thread Brian J. Murrell
On Fri, 2019-02-15 at 09:05 +0100, Thomas Haller via networkmanager-
list wrote:
> 
> Hi,

Hi Thomas,

> Best way is:
> 
> - first disable ratelimiting in journald. For that, set 
>   RateLimitIntervalSec=0 in /etc/systemd/journald.conf and
>   `systemctl restart systemd-journald.service`
> 
> - enable level=TRACE in /etc/NetworkManager/NetworkManager.conf
>   (level=DEBUG would be enough for these particular loglines,
>   but as you go throgh the effort of collecting logs, there is no
> need 
>   to restrict what to collect).
> 
> - restart NM, reproduce, and provide logs.

I sent this to you offline a few weeks ago and then a followup message
a week or two after that but didn't get any response.

I'm wondering if you perhaps didn't get those messages and if there is
another way I should send those to you, given that they are a bit
largish for the list.

Cheers,
b.



signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-02-15 Thread Thomas Haller via networkmanager-list
On Thu, 2019-02-14 at 10:29 -0500, Brian J. Murrell wrote:
> On Thu, 2019-02-14 at 15:16 +0100, Thomas Haller via networkmanager-
> list wrote:
> > Hi,
> 
> Hi.
> 
> > From the logs it looks like to me the IPv6 address of the sender is
> > indeed the instance managed by NetworkManager.
> 
> Oh yes, I am sure it's the NM machine that is sending the RSes.
> 
> > Is the issue hard to reproduce?
> 
> Not hard to reproduce but I don't have a "lab" to reproduce it in and
> so have to do it on my production network and doing so brings the
> router/switch that everything is connected to to it's knees so it's
> pretty catastrophic to reproduce.
> 
> I'm also concerned with the disk load that disabling the rate
> limiting
> of journald is going to cause on the NM machine given that it's
> suppressing thousands of messages at a time.
> 
> But I will give it another go.
> 
> Just so that I only have to do this once and as briefly as possible,
> what level of debugging do I need to set to get the messages logged
> by:
> 
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n711


Hi,

Best way is:

- first disable ratelimiting in journald. For that, set 
  RateLimitIntervalSec=0 in /etc/systemd/journald.conf and
  `systemctl restart systemd-journald.service`

- enable level=TRACE in /etc/NetworkManager/NetworkManager.conf
  (level=DEBUG would be enough for these particular loglines,
  but as you go throgh the effort of collecting logs, there is no need 
  to restrict what to collect).

- restart NM, reproduce, and provide logs.


In particular, loglines with the word "solicitation" seem relevant.


Thanks,
Thomas


signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-02-14 Thread Thomas Haller via networkmanager-list
On Thu, 2019-02-14 at 08:25 -0500, Brian J. Murrell wrote:
> On Thu, 2019-02-14 at 13:46 +0100, Thomas Haller wrote:
> > Hi,
> 
> Hi,
> 
> > In 1.10.2-16.el7_5, that code looks pretty similar. NetworkManager
> > should either log
> > 
> > "router solicitation sent"
> > 
> > or
> > 
> > "failure sending router solicitation: ...
> > 
> > for every RS that gets send.
> > 
> > 
> > Possibly the logging message was suppressed by journald's
> > ratelimiting
> 
> Most likely.
> 
> > (as the journal would tell you, if that's the case). Make sure to
> > disable rate-limiting, see [1].
> > 
> > 
> > [1] 
> > https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28
> 
> Is it really necessary to actually see the messages in the journal at
> this point?  Doesn't the packet trace and CPU load of NM from my
> previous message sufficiently prove that NM is storming RSes?
> 

Hi,


well, NetworkManager uses libndp for IPv6 neighbor discovery.


From the logs it looks like to me the IPv6 address of the sender is
indeed the instance managed by NetworkManager.


However,

1) libndp never sends RS without being told to do so (I think).

2) NetworkManager sends RS (via libndp) only at one place. It's not
clear to me how that code could be wrong so that it might result in a
flood of RS. The logfile should show whether this code is culprit or
not.


Is the issue hard to reproduce? Your efforts are appreciated!! :)


best,
Thomas


signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-02-14 Thread Brian J. Murrell
On Thu, 2019-02-14 at 13:46 +0100, Thomas Haller wrote:
> Hi,

Hi,

> In 1.10.2-16.el7_5, that code looks pretty similar. NetworkManager
> should either log
> 
> "router solicitation sent"
> 
> or
> 
> "failure sending router solicitation: ...
> 
> for every RS that gets send.
> 
> 
> Possibly the logging message was suppressed by journald's
> ratelimiting

Most likely.

> (as the journal would tell you, if that's the case). Make sure to
> disable rate-limiting, see [1].
> 
> 
> [1] 
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28

Is it really necessary to actually see the messages in the journal at
this point?  Doesn't the packet trace and CPU load of NM from my
previous message sufficiently prove that NM is storming RSes?

Cheers,
b.



signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-02-14 Thread Thomas Haller via networkmanager-list
Hi,


On Thu, 2019-02-14 at 07:01 -0500, Brian J. Murrell wrote:
> On Thu, 2019-02-14 at 07:49 +0100, Thomas Haller via networkmanager-
> 
> > It's not clear to me what you mean.
> > 
> > Do you mean, that the many RA received at NetworkManager cause
> > NetworkManager to send too frequent RS (in turn, repeating a
> > vicious
> > cycle)?
> 
> That's exactly how I understand the description from the author of
> the
> software on the router that is sending the RAs, yes:
> 
> https://github.com/openwrt/odhcpd/issues/122#issuecomment-459263681

OK, makes sense.


> > Also, NM only sends RS based on timeouts, not based on RAs that it
> > receives [1].
> 
> Assuming no bugs, of course.  Such as could the receiving of the
> multiple RAs perhaps be causing the timeout to get reset, etc.?
> 
> How/when does the logging in send_rs_timeout:
> 
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n711
> 
> supposed to get logged?

In 1.10.2-16.el7_5, that code looks pretty similar. NetworkManager
should either log

"router solicitation sent"

or

"failure sending router solicitation: ...

for every RS that gets send.


Possibly the logging message was suppressed by journald's ratelimiting
(as the journal would tell you, if that's the case). Make sure to
disable rate-limiting, see [1].


[1] 
https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28




signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-02-14 Thread Brian J. Murrell
On Thu, 2019-02-14 at 07:49 +0100, Thomas Haller via networkmanager-
list wrote:
> 
> Hi Brian,

Hi Thomas,

> It's not clear to me what you mean.
> 
> Do you mean, that the many RA received at NetworkManager cause
> NetworkManager to send too frequent RS (in turn, repeating a vicious
> cycle)?

That's exactly how I understand the description from the author of the
software on the router that is sending the RAs, yes:

https://github.com/openwrt/odhcpd/issues/122#issuecomment-459263681

> Also, NM only sends RS based on timeouts, not based on RAs that it
> receives [1].

Assuming no bugs, of course.  Such as could the receiving of the
multiple RAs perhaps be causing the timeout to get reset, etc.?

How/when does the logging in send_rs_timeout:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n711

supposed to get logged?

> can you clarify what you think is the harmful behavior?

It causes an RA/RS storm on the network segment which drives the load
up on the router as it tries to process all of the RSes.  I also see
other devices in my network go flakey when the storm starts.

> Also, does
> tcpdump/wireshark show that NM sends too many RS?

Well, the log from the router:

https://github.com/openwrt/odhcpd/files/2802914/logread.txt

shows it according to the author of that software, but I can grab a
packet trace if you like:

2266 8.103505030 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2267 8.110988364 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2268 8.111476269 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2269 8.116601719 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2270 8.116998692 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2271 8.122950355 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2272 8.125336593 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2273 8.133207122 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2274 8.133783515 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2275 8.141006274 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2276 8.144105304 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2277 8.148667072 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2278 8.154763444 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2279 8.154935042 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2280 8.157726075 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2281 8.160963459 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2282 8.163637161 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2283 8.183327416 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2284 8.183769855 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2285 8.189963026 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2286 8.190426487 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2287 8.195982154 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2288 8.196411883 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2289 8.201634132 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2290 8.202029638 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2291 8.207328153 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2292 8.207783792 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2293 8.213471947 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 6c:b0:ce:f5:1e:4a
2294 8.213864031 fe80::21f:c6ff:fec4:926a -> ff02::2  ICMPv6 70 Router 
Solicitation from 00:1f:c6:c4:92:6a
2295 8.219000726 fe80::6eb0:ceff:fef5:1e4a -> fe80::21f:c6ff:fec4:926a ICMPv6 
174 Router Advertisement from 

Re: Reciving multiple RAs causes NetworkManager to storm the network with RSes

2019-02-13 Thread Thomas Haller via networkmanager-list
On Thu, 2019-01-31 at 06:46 -0500, Brian J. Murrell wrote:
> On Thu, 2019-01-24 at 17:15 -0500, Brian J. Murrell wrote:
> > On Tue, 2019-01-22 at 16:25 -0500, Brian J. Murrell wrote:
> > > Seems I already had level=TRACE set.
> > 
> > Was that debug useful?
> 
> So, it seems this behaviour is more harmful to the network than just
> spamming a log on the NM host.
> 
> It's causing a message storm with the router that is sending the
> multiple RAs in that NM is sending RS messages in response to the RAs
> crushing the router with RS messages.
> 
> Given that, can any more importance be put on this issue?
> 

Hi Brian,


It's not clear to me what you mean.

Do you mean, that the many RA received at NetworkManager cause
NetworkManager to send too frequent RS (in turn, repeating a vicious
cycle)?

I wouldn't see that from the log. There are no log lines about
additional router solicitations sent by NetworkManager.

Also, NM only sends RS based on timeouts, not based on RAs that it
receives [1].

can you clarify what you think is the harmful behavior? Also, does
tcpdump/wireshark show that NM sends too many RS?


[1] 
https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/ndisc/nm-ndisc.c?id=2c881b8064e1cb5b227e5a3c61abfe95c6ddd05a#n698


best,
Thomas


signature.asc
Description: This is a digitally signed message part
___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list