On Sat, 2019-06-01 at 07:00 -0400, Brian J. Murrell wrote:
> On Thu, 2019-05-30 at 12:23 +0200, Thomas Haller via networkmanager-
> list wrote:
> > On Wed, 2019-05-29 at 12:23 -0400, Brian J. Murrell wrote:
> > > On Wed, 2019-05-29 at 09:28 +0200, Thomas Haller via
> > > networkmanager-
> > > list wrote:
> > > > If this is happening, when you kill NetworkManager with
> > > > SIGKILL,
> > > > it
> > > > would not give NetworkManager to cleanup...
> > > > 
> > > >   sudo killall -SIGKILL NetworkManager
> > > > 
> > > > (and veryify that NetworkManager is indeed not running. Maybe
> > > > first
> > > > `systemctl mask NetworkManager`, so that systemd won't restart
> > > > it).
> > > 
> > > Are you suggesting that I kill NM with SIGKILL as a debugging
> > > step
> > > to
> > > see if the RSes still stop when stopping NM?
> > 
> > Yes.
> > 
> > > If so, that was an interesting experiment:
> > > 
> > > # killall -SIGKILL NetworkManager
> > > # ps -ef | grep Network
> > > root     15001     1  0 12:02 ?        00:00:00
> > > /usr/sbin/NetworkManager --no-daemon
> > > 
> > > So clearly systemd restarted it, but it's not flooding RSes after
> > > the
> > > restart.
> > 
> > Ah ok. That's what I meant with first masking NetworkManager.
> 
> # systemctl mask NetworkManager
> Created symlink from /etc/systemd/system/NetworkManager.service to
> /dev/null.
> # killall -SIGKILL NetworkManager
> # systemctl status NetworkManager
> ● NetworkManager.service
>    Loaded: masked (/dev/null; bad)
>    Active: failed (Result: signal) since Sat 2019-06-01 06:53:23 EDT;
> 39s ago
>  Main PID: 4286 (code=killed, signal=KILL)
> 
> May 31 22:04:18 server.example.com NetworkManager[4286]:
> <info>  [1559354658.5223] manager: NetworkManager state is now
> CONNECTED_SITE
> May 31 22:04:18 server.example.com NetworkManager[4286]:
> <info>  [1559354658.5274] policy: set 'enp2s0' (enp2s0) as default
> for IPv6 routing and DNS
> May 31 22:04:18 server.example.com NetworkManager[4286]:
> <info>  [1559354658.5670] device (enp2s0): Activation: successful,
> device activated.
> May 31 22:04:18 server.example.com NetworkManager[4286]:
> <info>  [1559354658.5807] manager: NetworkManager state is now
> CONNECTED_GLOBAL
> May 31 22:04:18 server.example.com NetworkManager[4286]:
> <info>  [1559354658.5934] manager: startup complete
> May 31 22:04:33 server.example.com NetworkManager[4286]:
> <info>  [1559354673.1941] policy: set 'enp2s0' (enp2s0) as default
> for IPv4 routing and DNS
> Jun 01 06:53:09 server.example.com systemd[1]: Current command
> vanished from the unit file, execution of the command list won't be
> resumed.
> Jun 01 06:53:23 server.example.com systemd[1]:
> NetworkManager.service: main process exited, code=killed,
> status=9/KILL
> Jun 01 06:53:23 server.example.com systemd[1]: Unit
> NetworkManager.service entered failed state.
> Jun 01 06:53:23 server.example.com systemd[1]: NetworkManager.service
> failed.
> 
> Before killing NM, it was flooding out RSes and after killing it it
> stopped.

Thanks, that's of course a string indication that NetworkManager is the
culprit.

But I don't see how.

NetworkManager uses libndp for NDP. There is only one place where it
sends RS, and thereby it also should log a debug message:

[1] 
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/blob/8964dbe8bc9cbe7300a48bffe86faee6b149fbf2/src/ndisc/nm-ndisc.c#L555

I did not see these messages...

Just to check, did you enable level=TRACE logging and made sure that a
possible burst of logging messages is not rate limited by systemd-journald?
See the comments at

[2] 
https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf#n28

about that. Then, in the logfile I presume you do not see the messages
about "router solicitation". So, it's unclear how these messages get
sent.


Maybe the issue is in libndp, but that should also just send one
message

[3] 
https://github.com/jpirko/libndp/blob/98bdaa1cb94faff0ccf992abc40a352ea16640fa/libndp/libndp.c#L195

... unless kernel wrongly fails with errno EINTR, which seems unlikely.
It would be interesting to see if this goes away by setting "errno =
0;" right before sendto() in [3].


No ideas so far.


How is the CPU usage of NetworkManager when this happens?


best,
Thomas



> 
> > If you kill NetworkManager with SIGKILL (without letting
> > NetworkManager
> > restart), it would not give NetworkManager time to do anything.
> > If that stops the flodding, then the messages were sent by
> > NetworkManager -- otherwise not.
> 
> Indeed.
> 
> > Thanks. It would be most interesting to see them at the moment when
> > the
> > flodding happen.s
> 
> Damn.  I should have gathered these before killing NM because now
> that
> I have and have restarted it, it's not flooding again.
> 
> I will update here the sysctl content when I see it flooding again.
> 
> Cheers,
> b.
> 
> _______________________________________________
> networkmanager-list mailing list
> networkmanager-list@gnome.org
> https://mail.gnome.org/mailman/listinfo/networkmanager-list

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list

Reply via email to