[slurm-dev] Required node not available (down or drained)

2013-08-21 Thread Sivasangari Nandy
  Hello,
 

  I'm trying to use Slurm for the first time, and I got a problem
  with
  nodes I think.
 
  I have this message when I used squeue :
 

  root@VM-667:~# squeue
 
  JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
 
  50 SLURM-deb test.sh root PD 0:00 1 (ReqNodeNotAvail)
 

  or this one with an other squeue :
 

  root@VM-671:~# squeue
 
  JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
 
  50 SLURM-deb test.sh root PD 0:00 1 (Resources)
 

  sinfo gives me :
 

  PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
 
  SLURM-de* up infinite 3 down VM-[669-671]
 

  I have already used slurm one time with the same configuration and
  I
  wan able to run my job.
 
  But now the second time I always got :
 

  srun: Required node not available (down or drained)
 
  srun: job 51 queued and waiting for resources
 

  Advance thanks for your help,
 
  Siva
 

[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Sivasangari Nandy
I have tried : 

/etc/init.d/slurm-llnl start 

[ ok ] Starting slurm central management daemon: slurmctld. 
/usr/sbin/slurmctld already running. 

And : 

scontrol show slurmd 

scontrol: error: slurm_slurmd_info: Connection refused 
slurm_load_slurmd_status: Connection refused 

Hum how to proceed to repair that problem ? 

- Mail original -

 De: Danny Auble d...@schedmd.com
 À: slurm-dev slurm-dev@schedmd.com
 Envoyé: Mercredi 21 Août 2013 15:36:53
 Objet: [slurm-dev] Re: Required node not available (down or drained)

 Check your slurmd log. It doesn't appear the slurmd is running.

 Sivasangari Nandy  sivasangari.na...@irisa.fr  wrote:
Hello,
   
  
 

I'm trying to use Slurm for the first time, and I got a problem
with
nodes I think.
   
  
 
I have this message when I used squeue :
   
  
 

root@VM-667:~# squeue
   
  
 
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
   
  
 
50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail)
   
  
 

or this one with an other squeue :
   
  
 

root@VM-671:~# squeue
   
  
 
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
   
  
 
50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources)
   
  
 

sinfo gives me :
   
  
 

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
   
  
 
SLURM-de* up infinite 3 down VM-[669-671]
   
  
 

I have already used slurm one time with the same configuration
and
I
wan able to run my job.
   
  
 
But now the second time I always got :
   
  
 

srun: Required node not available (down or drained)
   
  
 
srun: job 51 queued and waiting for resources
   
  
 

Advance thanks for your help,
   
  
 
Siva
   
  
 
-- 

Siva sangari NANDY - Plate-forme GenOuest 
IRISA-INRIA, Campus de Beaulieu 
263 Avenue du Général Leclerc 

35042 Rennes cedex, France 
Tél: +33 (0) 2 99 84 25 69 

Bureau : D152 


[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Nikita Burtsev
slurmctld is the management process and since your have access to squeue/sinfo 
information it is running just fine. You need to check if slurmd (which is the 
agent part) is running on your nodes, i.e. VM-[669-671]  

--  
Nikita Burtsev


On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote:

 I have tried :  
  
 /etc/init.d/slurm-llnl start
  
 [ ok ] Starting slurm central management daemon: slurmctld.
 /usr/sbin/slurmctld already running.
  
 And :  
  
 scontrol show slurmd
  
 scontrol: error: slurm_slurmd_info: Connection refused
 slurm_load_slurmd_status: Connection refused
  
  
 Hum how to proceed to repair that problem ?
  
  
  De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com)
  À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
  Envoyé: Mercredi 21 Août 2013 15:36:53
  Objet: [slurm-dev] Re: Required node not available (down or drained)
   
  Check your slurmd log. It doesn't appear the slurmd is running.
   
  Sivasangari Nandy sivasangari.na...@irisa.fr 
  (mailto:sivasangari.na...@irisa.fr) wrote:
 Hello,  
  
 I'm trying to use Slurm for the first time, and I got a problem with 
 nodes I think.
 I have this message when I used squeue :
  
 root@VM-667:~# squeue
   JOBID PARTITION NAME USER  ST   TIME  NODES 
 NODELIST(REASON)
  50 SLURM-deb  test.sh (http://test.sh) root  PD ;   0:00 
  1 (ReqNodeNotAvail)
  
 or this one with an other squeue :
  
 root@VM-671:~# squeue
   JOBID PARTITION NAME USER  ST   TIME  NODES 
 NODELIST(REASON)
  50 SLURM-deb  test.sh (http://test.sh) root  PD   0:00   
 n bsp;  1 (Resources)
  
 sinfo gives me :
  
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 SLURM-de*up   infinite  3   down VM-[669-671]
  
  
 I have already used slurm one time with the same configuration and I 
 wan able to run my job.
 But now the second time I always got :  
  
 srun: Required node not available (down or drained)
 srun: job 51 queued and waiting for resources
  
  
 Advance thanks for your help,  
 Siva
  
  
 
 



   
   
  
  
  
  
 --  
 Sivasangari NANDY -  Plate-forme GenOuest
 IRISA-INRIA, Campus de Beaulieu
 263 Avenue du Général Leclerc
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 25 69
 Bureau :  D152
  



[slurm-dev] Fwd: Slurm Question

2013-08-21 Thread Matthew Russell
Hi,

I upgraded slurm on my Bright 5.2 Cluster from 2.2.7 o 2.4.2.  Recently
I've been having issue running slurm processes though.  I've read some
postings ( like
thishttps://groups.google.com/forum/#!searchin/slurm-devel/execve$20permission$20denied/slurm-devel/Bl0F9TBDPbw/-YzSm_nfo5MJ)
but I still couldn't get it working.

My slurm job is:
gm1@dena:~$ cat slurm_batch_test.sh
#!/home/gm1/
#SBATCH -D /home/gm1
#SBATCH --export=NONE
#SBATCH -o /home/gm1/test_001.10470.out.o
#SBATCH -e /home/gm1/test_001.10470.out.e
#SBATCH -J test_001.10470
#SBATCH --time=3600
#SBATCH --partition=matt
#SBATCH
#SBATCH -c 4
#SBATCH -t 4
#SBATCH
#SBATCH

module load slurm
mpirun hello.sh


I run it with, and get the results:
gm1@dena:~$ rm test_001.10470.out.*; sbatch slurm_batch_test.sh; sleep 1;
cat test*
Submitted batch job 1577
slurmd[dena1]: execve():
/cm/local/apps/slurm/2.4.2/spool/job01577/slurm_script: Permission denied

In my log file, I get: execve():
/cm/local/apps/slurm/2.4.2/spool/job01576/slurm_script: Permission denied

My script is executable:
gm1@dena:~$ ls hello.sh
-rwxr-xr-x 1 gm1 gm 37 Aug 21 11:55 hello.sh

slurmd is being run by root,
root 18462  0.0  0.0 159384  1868 ?S14:24   0:00
/cm/shared/apps/slurm/current/sbin/slurmd

I think it's running in /var/run/slurm
[root@dena1 2.4.2]# cat etc/slurm.conf |grep run
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid

Which is owned by slurm,
[root@dena1 2.4.2]# ls /var/run/ | grep slurm
drwxr-xr-x 2 slurm   slurm   4.0K Aug 21 14:24 slurm

And the ,pid file is owed by root,
[root@dena1 2.4.2]# ls /var/run/slurm/
total 4.0K
-rw-r--r-- 1 root root 6 Aug 21 14:24 slurmd.pid

I'm not sure how to continue.

Can anyone help?  Thanks.


[slurm-dev] How does sacct honor the -S and -E option?

2013-08-21 Thread Yong Qin
Hi,

This has been puzzling me for a while. So I'm hoping somebody can clarify
it for me. In short, when I use sacct -S $T1 -E $T2 I often get lots of
jobs that are completely out of the range of ($T1, $T2). For example,

$ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,start,end

I got a job output:

4173 2013-05-12T23:03:59 2013-05-13T11:53:42

This doesn't make sense to me. If I use -T option this is even worse
because it will modify the endtime to be earlier than the starttime. For
example,

4173 2013-05-12T23:03:59 2013-05-12T00:00:00

Can anybody shed a light here? We are running 2.5.7.

Thanks,

Yong Qin


[slurm-dev] Slurm 2.6.0 MIC GRES support does not set OFFLOAD_DEVICES=-1 if no GRES requested

2013-08-21 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks,

I'm trying to set up SLURM 2.6.0 GRES support for MIC cards and I've
found that whilst the documentation says:

http://slurm.schedmd.com/gres.html

# If no MICs are reserved via GRES, the OFFLOAD_DEVICES
# variable is set to -1.

that doesn't appear to happen (OFFLOAD_DEVICES is not set) and I don't
see any evidence of code to do that in the current slurm-2.6 branch.

Is it an oversight, or am I missing something?

Currently I'm using a taskprolog to set it to -1 if it's absent.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIVovEACgkQO2KABBYQAh/dOQCfRS1ShARMnVnnLMD5vG1RZjv7
5jkAn0Q2BO905uuu/1a2vj0fYiIE13MB
=sF0Q
-END PGP SIGNATURE-