Bug#796763: slurmd cannot be started under systemd

2015-08-23 Thread Andre Florath
Package: slurmd
Version: 14.03.9-5
Justification: renders package unusable
Severity: grave

Dear Maintainer,

it is not possible to start slurmd under systemd:

root@slurmclient1:~# systemctl stop slurmd
root@slurmclient1:~# time systemctl start slurmd
Job for slurmd.service failed. See 'systemctl status slurmd.service' and 
'journalctl -xn' for details.

real1m30.034s
user0m0.000s
sys 0m0.000s

journal:
Aug 12 17:53:45 slurmclient1 slurmd[1266]: CPU frequency setting not configured 
for this node
Aug 12 17:53:45 slurmclient1 slurmd[1268]: slurmd version 14.03.9 started
Aug 12 17:53:45 slurmclient1 slurmd[1268]: slurmd started on Wed, 12 Aug 2015 
17:53:45 +0200
Aug 12 17:53:45 slurmclient1 slurmd[1268]: CPUs=4 Boards=1 Sockets=4 Cores=1 
Threads=1 Memory=1000 TmpDisk=7321 Uptime=1320
Aug 12 17:55:15 slurmclient1 systemd[1]: slurmd.service start operation timed 
out. Terminating.
Aug 12 17:55:15 slurmclient1 slurmd[1268]: Slurmd shutdown completing
Aug 12 17:55:15 slurmclient1 systemd[1]: Failed to start Slurm node daemon.
Aug 12 17:55:15 slurmclient1 systemd[1]: Unit slurmd.service entered failed 
state.

Nevertheless, when I start the slurmd manually either with
$ slurmd -Dv
or
$ /usr/sbin/slurmd
everything works fine: process is started, connects to the control machine
and it is possible to execute commands on the node.

Workaround:
The process starts when the config (in /etc/default/slurmd) is set to:
SLURMD_OPTIONS="-D"
and in /lib/systemd/system/slurmd.service the type is changed to 'simple'.
(Nevertheless there is no log output any more, but as long as it works )

Kind regards

Andre

P.S.: Please note that this is also true for the slurmctld package.
  I'm not sure if this is the same root cause.  If you want,
  I can file the same bug also for slurmctld.


-- System Information:
Debian Release: 8.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages slurmd depends on:
ii  adduser  3.113+nmu3
ii  init-system-helpers  1.22
ii  libc62.19-18
ii  libhwloc51.10.0-3
ii  libpam0g 1.1.8-3.1
ii  lsb-base 4.1+Debian13+nmu1
ii  munge0.5.11-1.1+b1
ii  openssl  1.0.1k-3+deb8u1
ii  openssl-blacklist0.5-3
ii  slurm-wlm-basic-plugins  14.03.9-5

slurmd recommends no packages.

slurmd suggests no packages.

-- no debconf information



Bug#796763: slurmd cannot be started under systemd

2015-08-24 Thread Rémi Palancher

Hi Andre,

The slurm{d,ctld} service files expect the PID files to be located in 
/var/run/slurm-llnl subdir. It must be configured this way in your 
slurm.conf file (SlurmctldPidFile and SlurmdPidFile parameters) to make 
systemd service files work properly.


If the slurm daemons create the PID files in another location, systemd 
will timeout after waiting a few seconds for these files to be created 
in expected location.


Can you check the values of these parameters in your slurm.conf file?

Best,
Rémi



Bug#796763: slurmd cannot be started under systemd

2015-08-25 Thread Andre Florath
Hello Remi,

HM

You are completely right.
And I have no real excuse

Maybe except that IMHO synchronization of two different
configuration files (slurm.conf and slurm(ctl)d.service)
might be not a good idea.

Thanks for your reply. From my point of view the bug can be closed.

Sorry for filing the report.

Kind regards

Andre