> On Jan 12, 2018, at 13:17, Mehdi Dogguy <me...@dogguy.org> wrote:
> 
> On Fri, Dec 29, 2017 at 05:58:10PM +0000, "Hattne, Johan" 
> <hatt...@janelia.hhmi.org> wrote:
>> Package: slurmd
>> Version: 16.05.9-1+deb9u1
>> 
>> By default, slurmd writes its PID file to /var/run/slurmd.pid, but
>> the systemd service expects it to be at
>> /var/run/slurm-llnl/slurmd.pid.  As a result, "systemctl start
>> slurmd" hangs unless slurm.conf defines
>> SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid.  This could be
>> synchronized by either patching the slurmd code or modifying the
>> service file; I’m guessing the latter is more appropriate.
>> --- /lib/systemd/system/slurmd.service       2017-11-05 05:26:27.000000000 
>> -0500
>> +++ /etc/systemd/system/slurmd.service       2017-12-28 17:24:40.708918382 
>> -0500
>> @@ -8,7 +8,7 @@
>> Type=forking
>> EnvironmentFile=/etc/default/slurmd
>> ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
>> -PIDFile=/var/run/slurm-llnl/slurmd.pid
>> +PIDFile=/var/run/slurmd.pid
>> 
>> [Install]
>> WantedBy=multi-user.target
>> 
>> I assume the same applies to the other slurm daemons.  This is on
>> Debian 9.3.
> 
> Thank you for the bugreport and sorry for not getting back to you sooner.
> The fact that slurm's default do not match values from its debian package
> is indeed an issue and may lead to situations like the one you are
> describing. Though it is not necessary to change the .service file. You
> can specify SlurmctldPidFile, SlurmdPidFile or PidFile in the appropriate
> configuration files for (respectively) slurmctld, slurmd and slurmdbd.
> 
> I am not going to revert changes on the run directory in the debian
> packaging for now (as I guess my co-maintainer had good reasons to override
> them), but I'll change debian's provided slurm's defaults to be coherent
> with the reste of the package.

Thanks for looking into this Mehdi!

I’m not sure providing a consistent value in the default configuration file 
would be appropriate.  If one sets SlurmPidFile in slurm.conf, it will apply to 
all the members of the cluster, because slurm.conf (by default) has to be 
identical on all the nodes.  This is fine if all the nodes are identically 
configured.

However, we have a somewhat heterogenous cluster, where some nodes are not 
running Debian, and their startup scripts consequently look for the PID-file in 
/var/run (the default in the slurm code).  If we set SlurmPidFile to 
/var/run/slurm-llnl/slurmd.pid we run into the same issue there.

I suppose patching slurm to write its PID file to 
/var/run/slurm-llns/slurmd.pid in conjunction with a consistent default 
configuration file would work as well, but that’s a bit more work (and the 
Debian slurm will diverge a tad from upstream).

// Best wishes; Johan

Reply via email to