Re: [slurm-users] [EXT] Re: systemctl enable slurmd.service Failed to execute operation: No such file or directory

Sean Crosby Mon, 31 Jan 2022 23:31:46 -0800

Did you build Slurm yourself from source? If so, when you build from source, on 
that node, you need to have the munge-devel package installed (munge-devel on 
EL systems, libmunge-dev on Debian)


You then need to set up munge with a shared munge key between the nodes, and 
have the munge daemon running.

This is all detailed on Ole's wiki which was linked previously - 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

Sean
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Nousheen 
<nousheenparv...@gmail.com>
Sent: Tuesday, 1 February 2022 15:56
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [EXT] Re: [slurm-users] systemctl enable slurmd.service Failed to 
execute operation: No such file or directory

External email: Please exercise caution

________________________________
Dear Ole and Hermann,

I have reinstalled slurm from scratch now following this link:

The error remains the same. Kindly guide me where will i find this cred/munge 
plugin. Please help me resolve this issue.

[root@exxact slurm]# slurmd -C
NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 
ThreadsPerCore=2 RealMemory=31889
UpTime=0-22:06:45
[root@exxact slurm]# systemctl enable slurmctld.service
[root@exxact slurm]# systemctl start slurmctld.service
[root@exxact slurm]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor 
preset: disabled)
   Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s ago
  Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS 
(code=exited, status=1/FAILURE)
 Main PID: 27530 (code=exited, status=1/FAILURE)

Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited, ...RE
Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed state.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed.


[root@exxact slurm]# /usr/local/sbin/slurmctld -D
slurmctld: slurmctld version 21.08.5 started on cluster cluster194
slurmctld: error: Couldn't find the specified plugin name for cred/munge 
looking at all files
slurmctld: error: cannot find cred plugin for cred/munge
slurmctld: error: cannot create cred context for cred/munge
slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted



Best Regards,
Nousheen Parvaiz
[https://mailfoogae.appspot.com/t?sender=abm91c2hlZW5wYXJ2YWl6QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=7435e410-9fbe-4cc6-acf8-889877b5c100]ᐧ

On Tue, Feb 1, 2022 at 9:06 AM Nousheen 
<nousheenparv...@gmail.com<mailto:nousheenparv...@gmail.com>> wrote:
Dear Ole,

Thank you for your response.
I am doing it again using your suggested link.

Best Regards,
Nousheen Parvaiz


[https://mailfoogae.appspot.com/t?sender=abm91c2hlZW5wYXJ2YWl6QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=9a84a710-e6c1-4912-a461-6103eb630f96]ᐧ

On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen 
<ole.h.niel...@fysik.dtu.dk<mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
Hi Nousheen,

I recommend you again to follow the steps for installing Slurm on a CentOS
7 cluster:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

Maybe you will need to start installation from scratch, but the steps are
guaranteed to work if followed correctly.

IHTH,
Ole

On 1/31/22 06:23, Nousheen wrote:
> The same error shows up on compute node which is as follows:
>
> [root@c103008 ~]# systemctl enable slurmd.service
> [root@c103008 ~]# systemctl start slurmd.service
> [root@c103008 ~]# systemctl status slurmd.service
> ● slurmd.service - Slurm node daemon
>     Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor
> preset: disabled)
>     Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 EST;
> 2s ago
>    Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS
> (code=exited, status=203/EXEC)
>   Main PID: 11505 (code=exited, status=203/EXEC)
>
> Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited,
> code=exited, status=203/EXEC
> Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed state.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
>
>
> Best Regards,
> Nousheen Parvaiz
>
>
> ᐧ
>
> On Mon, Jan 31, 2022 at 10:08 AM Nousheen 
> <nousheenparv...@gmail.com<mailto:nousheenparv...@gmail.com>
> <mailto:nousheenparv...@gmail.com<mailto:nousheenparv...@gmail.com>>> wrote:
>
>     Dear Jeffrey,
>
>     Thank you for your response. I have followed the steps as instructed.
>     After the copying the files to their respective locations "systemctl
>     status slurmctld.service" command gives me an error as follows:
>
>     (base) [nousheen@exxact system]$ systemctl daemon-reload
>     (base) [nousheen@exxact system]$ systemctl enable slurmctld.service
>     (base) [nousheen@exxact system]$ systemctl start slurmctld.service
>     (base) [nousheen@exxact system]$ systemctl status slurmctld.service
>     ● slurmctld.service - Slurm controller daemon
>         Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
>     vendor preset: disabled)
>         Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
>     PKT; 3s ago
>        Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
>     $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
>       Main PID: 18114 (code=exited, status=1/FAILURE)
>
>     Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
>     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
>     exited, code=exited, status=1/FAILURE
>     Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered
>     failed state.
>     Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
>
>     Kindly guide me. Thank you so much for your time.
>
>     Best Regards,
>     Nousheen Parvaiz
>
>     ᐧ
>
>     On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang 
> <jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>
>     <mailto:jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>>> wrote:
>
>         The missing file error has nothing to do with slurm.  The
>         systemctl command is part of the systems service management.____
>
>         __ __
>
>         The error message indicates that you haven’t copied the
>         slurmd.service file on your compute node to /etc/systemd/system or
>         /usr/lib/systemd/system.  /etc/systemd/system is usually used when
>         a user adds a new service to a machine.____
>
>         __ __
>
>         Depending on your version of Linux you may also need to do a
>         systemctl daemon-reload to activate the slurmd.service within
>         system.____
>
>         __ __
>
>         Once slurmd.service is copied over, the systemctld command should
>         work just fine.____
>
>         __ __
>
>         Remember:____
>
>                          slurmd.service     -  Only on compute nodes____
>
>                          slurmctld.service – Only on your cluster
>         management node____
>
>                        slurmdbd.service – Only on your cluster management
>         node____
>
>         __ __
>
>         *From:* slurm-users 
> <slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>
>         
> <mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>>>
>  *On Behalf Of
>         *Nousheen
>         *Sent:* Thursday, January 27, 2022 3:54 AM
>         *To:* Slurm User Community List 
> <slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
>         
> <mailto:slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>>
>         *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
>         execute operation: No such file or directory____
>
>         __ __
>
>         ◆ This message was sent from a non-UWYO address. Please exercise
>         caution when clicking links or opening attachments from external
>         sources.____
>
>         __ __
>
>         __ __
>
>         Hello everyone,____
>
>         __ __
>
>         I am installing slurm on Centos 7 following tutorial:
>         
> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
>         
> <https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____
>
>         __ __
>
>         I am at the step where we start slurm but it gives me the
>         following error:____
>
>         __ __
>
>         [root@exxact slurm-21.08.5]# systemctl enable slurmd.service____
>
>         Failed to execute operation: No such file or directory____
>
>         __ __
>
>         I have run the command to check if slurm is configured properly____
>
>         __ __
>
>         [root@exxact slurm-21.08.5]# slurmd -C
>         NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
>         CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
>         UpTime=19-16:06:00____
>
>         __ __
>
>         I am new to this and unable to understand the problem. Kindly help
>         me resolve this.____
>
>         __ __
>
>         My slurm.conf file is as follows:____
>
>         __ __
>
>         # slurm.conf file generated by configurator.html.
>         # Put this file on all nodes of your cluster.
>         # See the slurm.conf man page for more information.
>         #
>         ClusterName=cluster194
>         SlurmctldHost=192.168.60.194
>         #SlurmctldHost=
>         #
>         #DisableRootJobs=NO
>         #EnforcePartLimits=NO
>         #Epilog=
>         #EpilogSlurmctld=
>         #FirstJobId=1
>         #MaxJobId=67043328
>         #GresTypes=
>         #GroupUpdateForce=0
>         #GroupUpdateTime=600
>         #JobFileAppend=0
>         #JobRequeue=1
>         #JobSubmitPlugins=lua
>         #KillOnBadExit=0
>         #LaunchType=launch/slurm
>         #Licenses=foo*4,bar
>         #MailProg=/bin/mail
>         #MaxJobCount=10000
>         #MaxStepCount=40000
>         #MaxTasksPerNode=512
>         MpiDefault=none
>         #MpiParams=ports=#-#
>         #PluginDir=
>         #PlugStackConfig=
>         #PrivateData=jobs
>         ProctrackType=proctrack/cgroup
>         #Prolog=
>         #PrologFlags=
>         #PrologSlurmctld=
>         #PropagatePrioProcess=0
>         #PropagateResourceLimits=
>         #PropagateResourceLimitsExcept=
>         #RebootProgram=
>         ReturnToService=1
>         SlurmctldPidFile=/var/run/slurmctld.pid
>         SlurmctldPort=6817
>         SlurmdPidFile=/var/run/slurmd.pid
>         SlurmdPort=6818
>         SlurmdSpoolDir=/var/spool/slurmd
>         SlurmUser=nousheen
>         #SlurmdUser=root
>         #SrunEpilog=
>         #SrunProlog=
>         
> StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
>         SwitchType=switch/none
>         #TaskEpilog=
>         TaskPlugin=task/affinity
>         #TaskProlog=
>         #TopologyPlugin=topology/tree
>         #TmpFS=/tmp
>         #TrackWCKey=no
>         #TreeWidth=
>         #UnkillableStepProgram=
>         #UsePAM=0
>         #
>         #
>         # TIMERS
>         #BatchStartTimeout=10
>         #CompleteWait=0
>         #EpilogMsgTime=2000
>         #GetEnvTimeout=2
>         #HealthCheckInterval=0
>         #HealthCheckProgram=
>         InactiveLimit=0
>         KillWait=30
>         #MessageTimeout=10
>         #ResvOverRun=0
>         MinJobAge=300
>         #OverTimeLimit=0
>         SlurmctldTimeout=120
>         SlurmdTimeout=300
>         #UnkillableStepTimeout=60
>         #VSizeFactor=0
>         Waittime=0
>         #
>         #
>         # SCHEDULING
>         #DefMemPerCPU=0
>         #MaxMemPerCPU=0
>         #SchedulerTimeSlice=30
>         SchedulerType=sched/backfill
>         SelectType=select/cons_tres
>         SelectTypeParameters=CR_Core
>         #
>         #
>         # JOB PRIORITY
>         #PriorityFlags=
>         #PriorityType=priority/basic
>         #PriorityDecayHalfLife=
>         #PriorityCalcPeriod=
>         #PriorityFavorSmall=
>         #PriorityMaxAge=
>         #PriorityUsageResetPeriod=
>         #PriorityWeightAge=
>         #PriorityWeightFairshare=
>         #PriorityWeightJobSize=
>         #PriorityWeightPartition=
>         #PriorityWeightQOS=
>         #
>         #
>         # LOGGING AND ACCOUNTING
>         #AccountingStorageEnforce=0
>         #AccountingStorageHost=
>         #AccountingStoragePass=
>         #AccountingStoragePort=
>         AccountingStorageType=accounting_storage/none
>         #AccountingStorageUser=
>         #AccountingStoreFlags=
>         #JobCompHost=
>         #JobCompLoc=
>         #JobCompPass=
>         #JobCompPort=
>         JobCompType=jobcomp/none
>         #JobCompUser=
>         #JobContainerType=job_container/none
>         JobAcctGatherFrequency=30
>         JobAcctGatherType=jobacct_gather/none
>         SlurmctldDebug=info
>         SlurmctldLogFile=/var/log/slurmctld.log
>         SlurmdDebug=info
>         SlurmdLogFile=/var/log/slurmd.log
>         #SlurmSchedLogFile=
>         #SlurmSchedLogLevel=
>         #DebugFlags=
>         #
>         #
>         # POWER SAVE SUPPORT FOR IDLE NODES (optional)
>         #SuspendProgram=
>         #ResumeProgram=
>         #SuspendTimeout=
>         #ResumeTimeout=
>         #ResumeRate=
>         #SuspendExcNodes=
>         #SuspendExcParts=
>         #SuspendRate=
>         #SuspendTime=
>         #
>         #
>         # COMPUTE NODES
>         NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
>
>         PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE
>         State=UP ____
>
>         __ __
>
>
>         ____
>
>         Best Regards,____
>
>         Nousheen Parvaiz____
>
>         ᐧ____

Re: [slurm-users] [EXT] Re: systemctl enable slurmd.service Failed to execute operation: No such file or directory

Reply via email to