Did you build Slurm yourself from source? If so, when you build from source, on that node, you need to have the munge-devel package installed (munge-devel on EL systems, libmunge-dev on Debian)
You then need to set up munge with a shared munge key between the nodes, and have the munge daemon running. This is all detailed on Ole's wiki which was linked previously - https://wiki.fysik.dtu.dk/niflheim/Slurm_installation Sean ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Nousheen <nousheenparv...@gmail.com> Sent: Tuesday, 1 February 2022 15:56 To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: [EXT] Re: [slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory External email: Please exercise caution ________________________________ Dear Ole and Hermann, I have reinstalled slurm from scratch now following this link: The error remains the same. Kindly guide me where will i find this cred/munge plugin. Please help me resolve this issue. [root@exxact slurm]# slurmd -C NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889 UpTime=0-22:06:45 [root@exxact slurm]# systemctl enable slurmctld.service [root@exxact slurm]# systemctl start slurmctld.service [root@exxact slurm]# systemctl status slurmctld.service ● slurmctld.service - Slurm controller daemon Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s ago Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 27530 (code=exited, status=1/FAILURE) Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon. Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited, ...RE Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed state. Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed. [root@exxact slurm]# /usr/local/sbin/slurmctld -D slurmctld: slurmctld version 21.08.5 started on cluster cluster194 slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files slurmctld: error: cannot find cred plugin for cred/munge slurmctld: error: cannot create cred context for cred/munge slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted Best Regards, Nousheen Parvaiz [https://mailfoogae.appspot.com/t?sender=abm91c2hlZW5wYXJ2YWl6QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=7435e410-9fbe-4cc6-acf8-889877b5c100]ᐧ On Tue, Feb 1, 2022 at 9:06 AM Nousheen <nousheenparv...@gmail.com<mailto:nousheenparv...@gmail.com>> wrote: Dear Ole, Thank you for your response. I am doing it again using your suggested link. Best Regards, Nousheen Parvaiz [https://mailfoogae.appspot.com/t?sender=abm91c2hlZW5wYXJ2YWl6QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=9a84a710-e6c1-4912-a461-6103eb630f96]ᐧ On Mon, Jan 31, 2022 at 2:07 PM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk<mailto:ole.h.niel...@fysik.dtu.dk>> wrote: Hi Nousheen, I recommend you again to follow the steps for installing Slurm on a CentOS 7 cluster: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation Maybe you will need to start installation from scratch, but the steps are guaranteed to work if followed correctly. IHTH, Ole On 1/31/22 06:23, Nousheen wrote: > The same error shows up on compute node which is as follows: > > [root@c103008 ~]# systemctl enable slurmd.service > [root@c103008 ~]# systemctl start slurmd.service > [root@c103008 ~]# systemctl status slurmd.service > ● slurmd.service - Slurm node daemon > Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor > preset: disabled) > Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 EST; > 2s ago > Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS > (code=exited, status=203/EXEC) > Main PID: 11505 (code=exited, status=203/EXEC) > > Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon. > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited, > code=exited, status=203/EXEC > Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed state. > Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed. > > > Best Regards, > Nousheen Parvaiz > > > ᐧ > > On Mon, Jan 31, 2022 at 10:08 AM Nousheen > <nousheenparv...@gmail.com<mailto:nousheenparv...@gmail.com> > <mailto:nousheenparv...@gmail.com<mailto:nousheenparv...@gmail.com>>> wrote: > > Dear Jeffrey, > > Thank you for your response. I have followed the steps as instructed. > After the copying the files to their respective locations "systemctl > status slurmctld.service" command gives me an error as follows: > > (base) [nousheen@exxact system]$ systemctl daemon-reload > (base) [nousheen@exxact system]$ systemctl enable slurmctld.service > (base) [nousheen@exxact system]$ systemctl start slurmctld.service > (base) [nousheen@exxact system]$ systemctl status slurmctld.service > ● slurmctld.service - Slurm controller daemon > Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; > vendor preset: disabled) > Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31 > PKT; 3s ago > Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s > $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE) > Main PID: 18114 (code=exited, status=1/FAILURE) > > Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon. > Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process > exited, code=exited, status=1/FAILURE > Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered > failed state. > Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed. > > Kindly guide me. Thank you so much for your time. > > Best Regards, > Nousheen Parvaiz > > ᐧ > > On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang > <jrl...@uwyo.edu<mailto:jrl...@uwyo.edu> > <mailto:jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>>> wrote: > > The missing file error has nothing to do with slurm. The > systemctl command is part of the systems service management.____ > > __ __ > > The error message indicates that you haven’t copied the > slurmd.service file on your compute node to /etc/systemd/system or > /usr/lib/systemd/system. /etc/systemd/system is usually used when > a user adds a new service to a machine.____ > > __ __ > > Depending on your version of Linux you may also need to do a > systemctl daemon-reload to activate the slurmd.service within > system.____ > > __ __ > > Once slurmd.service is copied over, the systemctld command should > work just fine.____ > > __ __ > > Remember:____ > > slurmd.service - Only on compute nodes____ > > slurmctld.service – Only on your cluster > management node____ > > slurmdbd.service – Only on your cluster management > node____ > > __ __ > > *From:* slurm-users > <slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com> > > <mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>>> > *On Behalf Of > *Nousheen > *Sent:* Thursday, January 27, 2022 3:54 AM > *To:* Slurm User Community List > <slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> > > <mailto:slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>> > *Subject:* [slurm-users] systemctl enable slurmd.service Failed to > execute operation: No such file or directory____ > > __ __ > > ◆ This message was sent from a non-UWYO address. Please exercise > caution when clicking links or opening attachments from external > sources.____ > > __ __ > > __ __ > > Hello everyone,____ > > __ __ > > I am installing slurm on Centos 7 following tutorial: > > https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/ > > <https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____ > > __ __ > > I am at the step where we start slurm but it gives me the > following error:____ > > __ __ > > [root@exxact slurm-21.08.5]# systemctl enable slurmd.service____ > > Failed to execute operation: No such file or directory____ > > __ __ > > I have run the command to check if slurm is configured properly____ > > __ __ > > [root@exxact slurm-21.08.5]# slurmd -C > NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 > CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889 > UpTime=19-16:06:00____ > > __ __ > > I am new to this and unable to understand the problem. Kindly help > me resolve this.____ > > __ __ > > My slurm.conf file is as follows:____ > > __ __ > > # slurm.conf file generated by configurator.html. > # Put this file on all nodes of your cluster. > # See the slurm.conf man page for more information. > # > ClusterName=cluster194 > SlurmctldHost=192.168.60.194 > #SlurmctldHost= > # > #DisableRootJobs=NO > #EnforcePartLimits=NO > #Epilog= > #EpilogSlurmctld= > #FirstJobId=1 > #MaxJobId=67043328 > #GresTypes= > #GroupUpdateForce=0 > #GroupUpdateTime=600 > #JobFileAppend=0 > #JobRequeue=1 > #JobSubmitPlugins=lua > #KillOnBadExit=0 > #LaunchType=launch/slurm > #Licenses=foo*4,bar > #MailProg=/bin/mail > #MaxJobCount=10000 > #MaxStepCount=40000 > #MaxTasksPerNode=512 > MpiDefault=none > #MpiParams=ports=#-# > #PluginDir= > #PlugStackConfig= > #PrivateData=jobs > ProctrackType=proctrack/cgroup > #Prolog= > #PrologFlags= > #PrologSlurmctld= > #PropagatePrioProcess=0 > #PropagateResourceLimits= > #PropagateResourceLimitsExcept= > #RebootProgram= > ReturnToService=1 > SlurmctldPidFile=/var/run/slurmctld.pid > SlurmctldPort=6817 > SlurmdPidFile=/var/run/slurmd.pid > SlurmdPort=6818 > SlurmdSpoolDir=/var/spool/slurmd > SlurmUser=nousheen > #SlurmdUser=root > #SrunEpilog= > #SrunProlog= > > StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld > SwitchType=switch/none > #TaskEpilog= > TaskPlugin=task/affinity > #TaskProlog= > #TopologyPlugin=topology/tree > #TmpFS=/tmp > #TrackWCKey=no > #TreeWidth= > #UnkillableStepProgram= > #UsePAM=0 > # > # > # TIMERS > #BatchStartTimeout=10 > #CompleteWait=0 > #EpilogMsgTime=2000 > #GetEnvTimeout=2 > #HealthCheckInterval=0 > #HealthCheckProgram= > InactiveLimit=0 > KillWait=30 > #MessageTimeout=10 > #ResvOverRun=0 > MinJobAge=300 > #OverTimeLimit=0 > SlurmctldTimeout=120 > SlurmdTimeout=300 > #UnkillableStepTimeout=60 > #VSizeFactor=0 > Waittime=0 > # > # > # SCHEDULING > #DefMemPerCPU=0 > #MaxMemPerCPU=0 > #SchedulerTimeSlice=30 > SchedulerType=sched/backfill > SelectType=select/cons_tres > SelectTypeParameters=CR_Core > # > # > # JOB PRIORITY > #PriorityFlags= > #PriorityType=priority/basic > #PriorityDecayHalfLife= > #PriorityCalcPeriod= > #PriorityFavorSmall= > #PriorityMaxAge= > #PriorityUsageResetPeriod= > #PriorityWeightAge= > #PriorityWeightFairshare= > #PriorityWeightJobSize= > #PriorityWeightPartition= > #PriorityWeightQOS= > # > # > # LOGGING AND ACCOUNTING > #AccountingStorageEnforce=0 > #AccountingStorageHost= > #AccountingStoragePass= > #AccountingStoragePort= > AccountingStorageType=accounting_storage/none > #AccountingStorageUser= > #AccountingStoreFlags= > #JobCompHost= > #JobCompLoc= > #JobCompPass= > #JobCompPort= > JobCompType=jobcomp/none > #JobCompUser= > #JobContainerType=job_container/none > JobAcctGatherFrequency=30 > JobAcctGatherType=jobacct_gather/none > SlurmctldDebug=info > SlurmctldLogFile=/var/log/slurmctld.log > SlurmdDebug=info > SlurmdLogFile=/var/log/slurmd.log > #SlurmSchedLogFile= > #SlurmSchedLogLevel= > #DebugFlags= > # > # > # POWER SAVE SUPPORT FOR IDLE NODES (optional) > #SuspendProgram= > #ResumeProgram= > #SuspendTimeout= > #ResumeTimeout= > #ResumeRate= > #SuspendExcNodes= > #SuspendExcParts= > #SuspendRate= > #SuspendTime= > # > # > # COMPUTE NODES > NodeName=linux[1-32] CPUs=11 State=UNKNOWN____ > > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE > State=UP ____ > > __ __ > > > ____ > > Best Regards,____ > > Nousheen Parvaiz____ > > ᐧ____