[slurm-dev] Required node not available (down or drained)
Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de* up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva
[slurm-dev] Re: Required node not available (down or drained)
I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? - Mail original - De: Danny Auble d...@schedmd.com À: slurm-dev slurm-dev@schedmd.com Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de* up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Siva sangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Re: Required node not available (down or drained)
slurmctld is the management process and since your have access to squeue/sinfo information it is running just fine. You need to check if slurmd (which is the agent part) is running on your nodes, i.e. VM-[669-671] -- Nikita Burtsev On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote: I have tried : /etc/init.d/slurm-llnl start [ ok ] Starting slurm central management daemon: slurmctld. /usr/sbin/slurmctld already running. And : scontrol show slurmd scontrol: error: slurm_slurmd_info: Connection refused slurm_load_slurmd_status: Connection refused Hum how to proceed to repair that problem ? De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com) À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com) Envoyé: Mercredi 21 Août 2013 15:36:53 Objet: [slurm-dev] Re: Required node not available (down or drained) Check your slurmd log. It doesn't appear the slurmd is running. Sivasangari Nandy sivasangari.na...@irisa.fr (mailto:sivasangari.na...@irisa.fr) wrote: Hello, I'm trying to use Slurm for the first time, and I got a problem with nodes I think. I have this message when I used squeue : root@VM-667:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD ; 0:00 1 (ReqNodeNotAvail) or this one with an other squeue : root@VM-671:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 50 SLURM-deb test.sh (http://test.sh) root PD 0:00 n bsp; 1 (Resources) sinfo gives me : PARTITION AVAIL TIMELIMIT NODES STATE NODELIST SLURM-de*up infinite 3 down VM-[669-671] I have already used slurm one time with the same configuration and I wan able to run my job. But now the second time I always got : srun: Required node not available (down or drained) srun: job 51 queued and waiting for resources Advance thanks for your help, Siva -- Sivasangari NANDY - Plate-forme GenOuest IRISA-INRIA, Campus de Beaulieu 263 Avenue du Général Leclerc 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 25 69 Bureau : D152
[slurm-dev] Fwd: Slurm Question
Hi, I upgraded slurm on my Bright 5.2 Cluster from 2.2.7 o 2.4.2. Recently I've been having issue running slurm processes though. I've read some postings ( like thishttps://groups.google.com/forum/#!searchin/slurm-devel/execve$20permission$20denied/slurm-devel/Bl0F9TBDPbw/-YzSm_nfo5MJ) but I still couldn't get it working. My slurm job is: gm1@dena:~$ cat slurm_batch_test.sh #!/home/gm1/ #SBATCH -D /home/gm1 #SBATCH --export=NONE #SBATCH -o /home/gm1/test_001.10470.out.o #SBATCH -e /home/gm1/test_001.10470.out.e #SBATCH -J test_001.10470 #SBATCH --time=3600 #SBATCH --partition=matt #SBATCH #SBATCH -c 4 #SBATCH -t 4 #SBATCH #SBATCH module load slurm mpirun hello.sh I run it with, and get the results: gm1@dena:~$ rm test_001.10470.out.*; sbatch slurm_batch_test.sh; sleep 1; cat test* Submitted batch job 1577 slurmd[dena1]: execve(): /cm/local/apps/slurm/2.4.2/spool/job01577/slurm_script: Permission denied In my log file, I get: execve(): /cm/local/apps/slurm/2.4.2/spool/job01576/slurm_script: Permission denied My script is executable: gm1@dena:~$ ls hello.sh -rwxr-xr-x 1 gm1 gm 37 Aug 21 11:55 hello.sh slurmd is being run by root, root 18462 0.0 0.0 159384 1868 ?S14:24 0:00 /cm/shared/apps/slurm/current/sbin/slurmd I think it's running in /var/run/slurm [root@dena1 2.4.2]# cat etc/slurm.conf |grep run SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmdPidFile=/var/run/slurm/slurmd.pid Which is owned by slurm, [root@dena1 2.4.2]# ls /var/run/ | grep slurm drwxr-xr-x 2 slurm slurm 4.0K Aug 21 14:24 slurm And the ,pid file is owed by root, [root@dena1 2.4.2]# ls /var/run/slurm/ total 4.0K -rw-r--r-- 1 root root 6 Aug 21 14:24 slurmd.pid I'm not sure how to continue. Can anyone help? Thanks.
[slurm-dev] How does sacct honor the -S and -E option?
Hi, This has been puzzling me for a while. So I'm hoping somebody can clarify it for me. In short, when I use sacct -S $T1 -E $T2 I often get lots of jobs that are completely out of the range of ($T1, $T2). For example, $ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,start,end I got a job output: 4173 2013-05-12T23:03:59 2013-05-13T11:53:42 This doesn't make sense to me. If I use -T option this is even worse because it will modify the endtime to be earlier than the starttime. For example, 4173 2013-05-12T23:03:59 2013-05-12T00:00:00 Can anybody shed a light here? We are running 2.5.7. Thanks, Yong Qin
[slurm-dev] Slurm 2.6.0 MIC GRES support does not set OFFLOAD_DEVICES=-1 if no GRES requested
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi folks, I'm trying to set up SLURM 2.6.0 GRES support for MIC cards and I've found that whilst the documentation says: http://slurm.schedmd.com/gres.html # If no MICs are reserved via GRES, the OFFLOAD_DEVICES # variable is set to -1. that doesn't appear to happen (OFFLOAD_DEVICES is not set) and I don't see any evidence of code to do that in the current slurm-2.6 branch. Is it an oversight, or am I missing something? Currently I'm using a taskprolog to set it to -1 if it's absent. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIVovEACgkQO2KABBYQAh/dOQCfRS1ShARMnVnnLMD5vG1RZjv7 5jkAn0Q2BO905uuu/1a2vj0fYiIE13MB =sF0Q -END PGP SIGNATURE-