Hi David, sorry for getting back to you so late. Thanks to your valuable contribution I managed to find a working solution.
On Fri, Aug 06, 2021 at 11:01:48AM -0300, David Bremner wrote: > I think (one) underlying problem is that the systemd unit file for > slurmctld is incorrect. The details are in [1], but it seems like > network.target is not correct (I think it very rarely is a useful > target). I added the following > > # /etc/systemd/system/slurmctld.service.d/override.conf > [Unit] > After=network-online.target munge.service > Wants=network-online.target Yes this change is now part of the service file. > I've switched to systemd-networkd on the hosts in question, so I can't > easily test how this works with ifupdown, but I notice ifupdown provides > > /lib/systemd/system/ifupdown-wait-online.service > > which (guessing based on the name) should provide similar functionality > to those documented in [1] for NetworkManager and systemd-networkd. > > [1]: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ Unfortunately using ifupdown-wait-online didn't help if I use ifupdown and allow-hotplug interfaces, but I did not tested it thoroughly since I want a solution that works out of the box. Therefore I decided to patch the slurm code that is failing in order to retry getaddrinfo before giving up starting daemons. Best regards, -- Gennaro Oliva