[slurm-dev] Re: Required node not available (down or drained)
And the log file is not informative tail -f /var/log/slurm-llnl/slurmd.log ... [2013-08-26T11:52:16] Slurmd shutdown completing [2013-08-26T11:52:56] slurmd version 2.3.4 started [2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200 [2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012 TmpDisk=9069 Uptime=1122626 - Mail original - De: Sivasangari Nandy sivasangari.na...@irisa.fr À: slurm-dev slurm-dev@schedmd.com Envoyé: Lundi 26 Août 2013 14:28:28 Objet: Re: [slurm-dev] Re: Required node not available (down or drained) Hi, I have checked some things, now my slurmctld and slurmd are in a single machine (using just one node) so the test is easier. For that I have modified the conf file : vi /etc/slurm-llnl/slurm.conf Slurmctld and slurmd are both running, here my ps result : root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm root 31712 31706 0 11:44 pts/1 00:00:00 tail -f /var/log/slurm-llnl/slurmd.log slurm 31990 1 0 11:52 ? 00:00:00 /usr/sbin/slurmctld root 32103 1 0 11:52 ? 00:00:00 /usr/sbin/slurmd -c root 32125 30346 0 11:53 pts/0 00:00:00 grep slurm So i have tried srun again but got this error yet: !srun srun /omaha-beach/test.sh srun: Required node not available (down or drained) srun: job 64 queued and waiting for resources Have you got any idea of the problem ? thanks, Siva - Mail original - De: Nikita Burtsev nikita.burt...@gmail.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Jeudi 22 Août 2013 09:59:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) You need to have slurmd running on all nodes that will execute jobs, so you should start it with init script. -- Nikita Burtsev Sent with Sparrow On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: check if the slurmd daemon is running with the command ps -el | grep slurmd . Nothing is happened with ps -el ... root@VM-667:~# ps -el | grep slurmd De: Nikita Burtsev nikita.burt...@gmail.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 18:58:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
[slurm-dev] Re: Required node not available (down or drained)
https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes -- Nikita Burtsev On Monday, August 26, 2013 at 7:43 PM, Sivasangari Nandy wrote: And the log file is not informative tail -f /var/log/slurm-llnl/slurmd.log ... [2013-08-26T11:52:16] Slurmd shutdown completing [2013-08-26T11:52:56] slurmd version 2.3.4 started [2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200 [2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012 TmpDisk=9069 Uptime=1122626 De: Sivasangari Nandy sivasangari.na...@irisa.fr (mailto:sivasangari.na...@irisa.fr) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Lundi 26 Août 2013 14:28:28 Objet: Re: [slurm-dev] Re: Required node not available (down or drained) Hi, I have checked some things, now my slurmctld and slurmd are in a single machine (using just one node) so the test is easier. For that I have modified the conf file : vi /etc/slurm-llnl/slurm.conf Slurmctld and slurmd are both running, here my ps result : root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm root 31712 31706 0 11:44 pts/100:00:00 tail -f /var/log/slurm-llnl/slurmd.log slurm31990 1 0 11:52 ?00:00:00 /usr/sbin/slurmctld root 32103 1 0 11:52 ?00:00:00 /usr/sbin/slurmd -c root 32125 30346 0 11:53 pts/000:00:00 grep slurm So i have tried srun again but got this error yet: !srun srun /omaha-beach/test.sh (http://test.sh) srun: Required node not available (down or drained) srun: job 64 queued and waiting for resources Have you got any idea of the problem ? thanks, Siva De: Nikita Burtsev nikita.burt...@gmail.com (mailto:nikita.burt...@gmail.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Jeudi 22 Août 2013 09:59:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) You need to have slurmd running on all nodes that will execute jobs, so you should start it with init script. -- Nikita Burtsev Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: check if the slurmd daemon is running with the command ps -el | grep slurmd. Nothing is happened with ps -el ... root@VM-667:~# ps -el | grep slurmd De: Nikita Burtsev nikita.burt...@gmail.com (mailto:nikita.burt...@gmail.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 18:58:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr (mailto:sivasangari.na...@irisa.fr) wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD 0:00 n bsp; 1 (Resources) sinfo gives me
[slurm-dev] Re: Required node not available (down or drained)
You need to have slurmd running on all nodes that will execute jobs, so you should start it with init script. -- Nikita Burtsev Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: check if the slurmd daemon is running with the command ps -el | grep slurmd. Nothing is happened with ps -el ... root@VM-667:~# ps -el | grep slurmd De: Nikita Burtsev nikita.burt...@gmail.com (mailto:nikita.burt...@gmail.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 18:58:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr (mailto:sivasangari.na...@irisa.fr) wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de*up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Re: Required node not available (down or drained)
that's what i have done yesterday actually : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. - Mail original - De: Nikita Burtsev nikita.burt...@gmail.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Jeudi 22 Août 2013 09:59:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) You need to have slurmd running on all nodes that will execute jobs, so you should start it with init script. -- Nikita Burtsev Sent with Sparrow On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: check if the slurmd daemon is running with the command ps -el | grep slurmd . Nothing is happened with ps -el ... root@VM-667:~# ps -el | grep slurmd De: Nikita Burtsev nikita.burt...@gmail.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 18:58:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de* up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Re: Required node not available (down or drained)
VM-667 where you have slurmctld running is your master, you don't need the agent part on it. As i understand your setup VM-[669-671] are your actual nodes, so you need to check if slurmd is running on those 3 and start it if needed. -- Nikita Burtsev On Thursday, August 22, 2013 at 12:02 PM, Sivasangari Nandy wrote: that's what i have done yesterday actually : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. De: Nikita Burtsev nikita.burt...@gmail.com (mailto:nikita.burt...@gmail.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Jeudi 22 Août 2013 09:59:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) You need to have slurmd running on all nodes that will execute jobs, so you should start it with init script. -- Nikita Burtsev Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: check if the slurmd daemon is running with the command ps -el | grep slurmd. Nothing is happened with ps -el ... root@VM-667:~# ps -el | grep slurmd De: Nikita Burtsev nikita.burt...@gmail.com (mailto:nikita.burt...@gmail.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 18:58:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr (mailto:sivasangari.na...@irisa.fr) wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de*up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Re: Required node not available (down or drained)
So i have done : /etc/init.d/slurm-llnl start in each node and tried again but i have : JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 1 (Resources) 53 SLURM-deb test.sh root PD 0:00 1 (Resources) and I have this when i try : root@VM-671:~# ps -el | grep slurmd 5 S 0 8223 1 0 80 0 - 22032 - ? 00:00:01 slurmd - Mail original - De: Nikita Burtsev nikita.burt...@gmail.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Jeudi 22 Août 2013 09:59:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) You need to have slurmd running on all nodes that will execute jobs, so you should start it with init script. -- Nikita Burtsev Sent with Sparrow On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote: check if the slurmd daemon is running with the command ps -el | grep slurmd . Nothing is happened with ps -el ... root@VM-667:~# ps -el | grep slurmd De: Nikita Burtsev nikita.burt...@gmail.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 18:58:52 Objet: [slurm-dev] Re: Required node not available (down or drained) Re: [slurm-dev] Re: Required node not available (down or drained) slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de* up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152 -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Re: Required node not available (down or drained)
I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? - Mail original - De: Danny Auble d...@schedmd.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de* up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Re: Required node not available (down or drained)
slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr (mailto:sivasangari.na...@irisa.fr) wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de*up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152