[slurm-dev] Re: Required node not available (down or drained)

2013-08-26 Thread Sivasangari Nandy
And the log file is not informative 

tail -f /var/log/slurm-llnl/slurmd.log 

... 
[2013-08-26T11:52:16] Slurmd shutdown completing 
[2013-08-26T11:52:56] slurmd version 2.3.4 started 
[2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200 
[2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012 
TmpDisk=9069 Uptime=1122626 

- Mail original -

 De: Sivasangari Nandy sivasangari.na...@irisa.fr
 À: slurm-dev slurm-dev@schedmd.com
 Envoyé: Lundi 26 Août 2013 14:28:28
 Objet: Re: [slurm-dev] Re: Required node not available (down or
 drained)

 Hi,

 I have checked some things, now my slurmctld and slurmd are in a
 single machine (using just one node) so the test is easier.
 For that I have modified the conf file : vi
 /etc/slurm-llnl/slurm.conf

 Slurmctld and slurmd are both running, here my ps result :

 root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm
 root 31712 31706 0 11:44 pts/1 00:00:00 tail -f
 /var/log/slurm-llnl/slurmd.log
 slurm 31990 1 0 11:52 ? 00:00:00 /usr/sbin/slurmctld
 root 32103 1 0 11:52 ? 00:00:00 /usr/sbin/slurmd -c
 root 32125 30346 0 11:53 pts/0 00:00:00 grep slurm

 So i have tried srun again but got this error yet:

 !srun
 srun /omaha-beach/test.sh
 srun: Required node not available (down or drained)
 srun: job 64 queued and waiting for resources

 Have you got any idea of the problem ?
 thanks,

 Siva

 - Mail original -

  De: Nikita Burtsev nikita.burt...@gmail.com
 
  À: slurm-dev slurm-dev@schedmd.com
 
  Envoyé: Jeudi 22 Août 2013 09:59:52
 
  Objet: [slurm-dev] Re: Required node not available (down or
  drained)
 

  Re: [slurm-dev] Re: Required node not available (down or drained)
 
  You need to have slurmd running on all nodes that will execute
  jobs,
  so you should start it with init script.
 

  --
 
  Nikita Burtsev
 
  Sent with Sparrow
 

  On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:
 
check if the slurmd daemon is running with the command  ps -el
   |
   grep slurmd . 
  
 

   Nothing is happened with ps -el ...
  
 

   root@VM-667:~# ps -el | grep slurmd
  
 

De: Nikita Burtsev  nikita.burt...@gmail.com 
   
  
 
À: slurm-dev  slurm-dev@schedmd.com 
   
  
 
Envoyé: Mercredi 21 Août 2013 18:58:52
   
  
 
Objet: [slurm-dev] Re: Required node not available (down or
drained)
   
  
 

Re: [slurm-dev] Re: Required node not available (down or
drained)
   
  
 
slurmctld is the management process and since your have access
to
squeue/sinfo information it is running just fine. You need to
check
if slurmd (which is the agent part) is running on your nodes,
i.e.
VM-[669-671]
   
  
 

--
   
  
 
Nikita Burtsev
   
  
 

On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy
wrote:
   
  
 
 I have tried :

   
  
 

 /etc/init.d/slurm-llnl start

   
  
 

 [ ok ] Starting slurm central management daemon: slurmctld.

   
  
 
 /usr/sbin/slurmctld already running.

   
  
 

 And :

   
  
 

 scontrol show slurmd

   
  
 

 scontrol: error: slurm_slurmd_info: Connection refused

   
  
 
 slurm_load_slurmd_status: Connection refused

   
  
 

 Hum how to proceed to repair that problem ?

   
  
 

  De: Danny Auble  d...@schedmd.com 
 

   
  
 
  À: slurm-dev  slurm-dev@schedmd.com 
 

   
  
 
  Envoyé: Mercredi 21 Août 2013 15:36:53
 

   
  
 
  Objet: [slurm-dev] Re: Required node not available (down or
  drained)
 

   
  
 

  Check your slurmd log. It doesn't appear the slurmd is
  running.
 

   
  
 

  Sivasangari Nandy  sivasangari.na...@irisa.fr  wrote:
 

   
  
 
 Hello,

   
  
 

   
  
 

 I'm trying to use Slurm for the first time, and I got
 a
 problem
 with
 nodes I think.

   
  
 

   
  
 
 I have this message when I used squeue :

   
  
 

   
  
 

 root@VM-667:~# squeue

   
  
 

   
  
 
 JOBID PARTITION NAME USER ST TIME NODES
 NODELIST(REASON)

   
  
 

   
  
 
 50 SLURM-deb test.sh root PD ; 0:00 1
 (ReqNodeNotAvail)

   
  
 

   
  
 

 or this one with an other squeue :

   
  
 

   
  
 

 root@VM-671:~# squeue

   
  
 

   
  
 
 JOBID PARTITION NAME USER ST TIME NODES
 NODELIST(REASON)

   
  
 

   
  
 
 50 SLURM-deb test.sh root PD 0:00 n bsp; 1
 (Resources)

   
  
 

   
  
 

 sinfo gives me :

   
  
 

   
  
 

 PARTITION AVAIL TIMELIMIT NODES STATE NODELIST

[slurm-dev] Re: Required node not available (down or drained)

2013-08-26 Thread Nikita Burtsev
https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes  

--  
Nikita Burtsev


On Monday, August 26, 2013 at 7:43 PM, Sivasangari Nandy wrote:

 And the log file is not informative  
  
 tail -f /var/log/slurm-llnl/slurmd.log
  
 ...
 [2013-08-26T11:52:16] Slurmd shutdown completing
 [2013-08-26T11:52:56] slurmd version 2.3.4 started
 [2013-08-26T11:52:56] slurmd started on Mon 26 Aug 2013 11:52:56 +0200
 [2013-08-26T11:52:56] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=2012 
 TmpDisk=9069 Uptime=1122626
  
  
  De: Sivasangari Nandy sivasangari.na...@irisa.fr 
  (mailto:sivasangari.na...@irisa.fr)
  À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
  Envoyé: Lundi 26 Août 2013 14:28:28
  Objet: Re: [slurm-dev] Re: Required node not available (down or drained)
   
  Hi,  
   
  I have checked some things, now my slurmctld and slurmd are in a single 
  machine (using just one node) so the test is easier.
  For that I have modified the conf file : vi /etc/slurm-llnl/slurm.conf
   
  Slurmctld and slurmd are both running, here my ps result :  
   
  root@VM-667:/etc/slurm-llnl# ps -ef | grep slurm  
  root 31712 31706  0 11:44 pts/100:00:00 tail -f 
  /var/log/slurm-llnl/slurmd.log
  slurm31990 1  0 11:52 ?00:00:00 /usr/sbin/slurmctld
  root 32103 1  0 11:52 ?00:00:00 /usr/sbin/slurmd -c
  root 32125 30346  0 11:53 pts/000:00:00 grep slurm
   
  So i have tried srun again but got this error yet:  
   
  !srun
  srun /omaha-beach/test.sh (http://test.sh)
  srun: Required node not available (down or drained)
  srun: job 64 queued and waiting for resources
   
   
  Have you got any idea of the problem ?
  thanks,
   
  Siva
   
   De: Nikita Burtsev nikita.burt...@gmail.com 
   (mailto:nikita.burt...@gmail.com)
   À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
   Envoyé: Jeudi 22 Août 2013 09:59:52
   Objet: [slurm-dev] Re: Required node not available (down or drained)

   Re: [slurm-dev] Re: Required node not available (down or drained)  
   You need to have slurmd running on all nodes that will execute jobs, so 
   you should start it with init script.   

   --  
   Nikita Burtsev
   Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


   On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:

check if the slurmd daemon is running with the command ps -el | grep 
slurmd.
 
Nothing is happened with ps -el ...
 
root@VM-667:~# ps -el | grep slurmd
 
 De: Nikita Burtsev nikita.burt...@gmail.com 
 (mailto:nikita.burt...@gmail.com)
 À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
 Envoyé: Mercredi 21 Août 2013 18:58:52
 Objet: [slurm-dev] Re: Required node not available (down or drained)
  
 Re: [slurm-dev] Re: Required node not available (down or drained)  
 slurmctld is the management process and since your have access to 
 squeue/sinfo information it is running just fine. You need to check 
 if slurmd (which is the agent part) is running on your nodes, i.e. 
 VM-[669-671]  
  
 --  
 Nikita Burtsev
  
  
 On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote:
  
  I have tried :  
   
  /etc/init.d/slurm-llnl start
   
  [ ok ] Starting slurm central management daemon: slurmctld.
  /usr/sbin/slurmctld already running.
   
  And :  
   
  scontrol show slurmd
   
  scontrol: error: slurm_slurmd_info: Connection refused
  slurm_load_slurmd_status: Connection refused
   
   
  Hum how to proceed to repair that problem ?
   
   
   De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com)
   À: slurm-dev slurm-dev@schedmd.com 
   (mailto:slurm-dev@schedmd.com)
   Envoyé: Mercredi 21 Août 2013 15:36:53
   Objet: [slurm-dev] Re: Required node not available (down or 
   drained)

   Check your slurmd log. It doesn't appear the slurmd is running.

   Sivasangari Nandy sivasangari.na...@irisa.fr 
   (mailto:sivasangari.na...@irisa.fr) wrote:
  Hello,  
   
  I'm trying to use Slurm for the first time, and I got a 
  problem with nodes I think.
  I have this message when I used squeue :
   
  root@VM-667:~# squeue
JOBID PARTITION NAME USER  ST   TIME  NODES 
  NODELIST(REASON)
   50 SLURM-deb  test.sh (http://test.sh) root  PD
   ;   0:00  1 (ReqNodeNotAvail)
   
  or this one with an other squeue :
   
  root@VM-671:~# squeue
JOBID PARTITION NAME USER  ST   TIME  NODES 
  NODELIST(REASON)
   50 SLURM-deb  test.sh (http://test.sh) root  PD
 0:00   n bsp;  1 (Resources)
   
  sinfo gives me

[slurm-dev] Re: Required node not available (down or drained)

2013-08-22 Thread Nikita Burtsev
You need to have slurmd running on all nodes that will execute jobs, so you 
should start it with init script.   

--  
Nikita Burtsev
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:

 check if the slurmd daemon is running with the command ps -el | grep 
 slurmd.
  
 Nothing is happened with ps -el ...
  
 root@VM-667:~# ps -el | grep slurmd
  
  De: Nikita Burtsev nikita.burt...@gmail.com 
  (mailto:nikita.burt...@gmail.com)
  À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
  Envoyé: Mercredi 21 Août 2013 18:58:52
  Objet: [slurm-dev] Re: Required node not available (down or drained)
   
  Re: [slurm-dev] Re: Required node not available (down or drained)  
  slurmctld is the management process and since your have access to 
  squeue/sinfo information it is running just fine. You need to check if 
  slurmd (which is the agent part) is running on your nodes, i.e. 
  VM-[669-671]  
   
  --  
  Nikita Burtsev
   
   
  On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote:
   
   I have tried :  

   /etc/init.d/slurm-llnl start

   [ ok ] Starting slurm central management daemon: slurmctld.
   /usr/sbin/slurmctld already running.

   And :  

   scontrol show slurmd

   scontrol: error: slurm_slurmd_info: Connection refused
   slurm_load_slurmd_status: Connection refused


   Hum how to proceed to repair that problem ?


De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com)
À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
Envoyé: Mercredi 21 Août 2013 15:36:53
Objet: [slurm-dev] Re: Required node not available (down or drained)
 
Check your slurmd log. It doesn't appear the slurmd is running.
 
Sivasangari Nandy sivasangari.na...@irisa.fr 
(mailto:sivasangari.na...@irisa.fr) wrote:
   Hello,  

   I'm trying to use Slurm for the first time, and I got a problem 
   with nodes I think.
   I have this message when I used squeue :

   root@VM-667:~# squeue
 JOBID PARTITION NAME USER  ST   TIME  NODES 
   NODELIST(REASON)
50 SLURM-deb  test.sh (http://test.sh) root  PD ;   
   0:00  1 (ReqNodeNotAvail)

   or this one with an other squeue :

   root@VM-671:~# squeue
 JOBID PARTITION NAME USER  ST   TIME  NODES 
   NODELIST(REASON)
50 SLURM-deb  test.sh (http://test.sh) root  PD   
   0:00   n bsp;  1 (Resources)

   sinfo gives me :

   PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
   SLURM-de*up   infinite  3   down VM-[669-671]


   I have already used slurm one time with the same configuration 
   and I wan able to run my job.
   But now the second time I always got :  

   srun: Required node not available (down or drained)
   srun: job 51 queued and waiting for resources


   Advance thanks for your help,  
   Siva


   
   
  
  
  
 
 




   --  
   Sivasangari NANDY -  Plate-forme GenOuest
   IRISA-INRIA, Campus de Beaulieu
   263 Avenue du Général Leclerc
   35042 Rennes cedex, France
   Tél: +33 (0) 2 99 84 25 69
   Bureau :  D152

   
  
  
  
 --  
 Sivasangari NANDY -  Plate-forme GenOuest
 IRISA-INRIA, Campus de Beaulieu
 263 Avenue du Général Leclerc
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 25 69
 Bureau :  D152
  



[slurm-dev] Re: Required node not available (down or drained)

2013-08-22 Thread Sivasangari Nandy
that's what i have done yesterday actually : 

/etc/init.d/slurm-llnl start 

[ ok ] Starting slurm central management daemon: slurmctld. 
/usr/sbin/slurmctld already running. 
- Mail original -

 De: Nikita Burtsev nikita.burt...@gmail.com
 À: slurm-dev slurm-dev@schedmd.com
 Envoyé: Jeudi 22 Août 2013 09:59:52
 Objet: [slurm-dev] Re: Required node not available (down or drained)

 Re: [slurm-dev] Re: Required node not available (down or drained)
 You need to have slurmd running on all nodes that will execute jobs,
 so you should start it with init script.

 --
 Nikita Burtsev
 Sent with Sparrow

 On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:
   check if the slurmd daemon is running with the command  ps -el |
  grep slurmd . 
 

  Nothing is happened with ps -el ...
 

  root@VM-667:~# ps -el | grep slurmd
 

   De: Nikita Burtsev  nikita.burt...@gmail.com 
  
 
   À: slurm-dev  slurm-dev@schedmd.com 
  
 
   Envoyé: Mercredi 21 Août 2013 18:58:52
  
 
   Objet: [slurm-dev] Re: Required node not available (down or
   drained)
  
 

   Re: [slurm-dev] Re: Required node not available (down or drained)
  
 
   slurmctld is the management process and since your have access to
   squeue/sinfo information it is running just fine. You need to
   check
   if slurmd (which is the agent part) is running on your nodes,
   i.e.
   VM-[669-671]
  
 

   --
  
 
   Nikita Burtsev
  
 

   On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy
   wrote:
  
 
I have tried :
   
  
 

/etc/init.d/slurm-llnl start
   
  
 

[ ok ] Starting slurm central management daemon: slurmctld.
   
  
 
/usr/sbin/slurmctld already running.
   
  
 

And :
   
  
 

scontrol show slurmd
   
  
 

scontrol: error: slurm_slurmd_info: Connection refused
   
  
 
slurm_load_slurmd_status: Connection refused
   
  
 

Hum how to proceed to repair that problem ?
   
  
 

 De: Danny Auble  d...@schedmd.com 

   
  
 
 À: slurm-dev  slurm-dev@schedmd.com 

   
  
 
 Envoyé: Mercredi 21 Août 2013 15:36:53

   
  
 
 Objet: [slurm-dev] Re: Required node not available (down or
 drained)

   
  
 

 Check your slurmd log. It doesn't appear the slurmd is
 running.

   
  
 

 Sivasangari Nandy  sivasangari.na...@irisa.fr  wrote:

   
  
 
Hello,
   
  
 

   
  
 

I'm trying to use Slurm for the first time, and I got a
problem
with
nodes I think.
   
  
 

   
  
 
I have this message when I used squeue :
   
  
 

   
  
 

root@VM-667:~# squeue
   
  
 

   
  
 
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
   
  
 

   
  
 
50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail)
   
  
 

   
  
 

or this one with an other squeue :
   
  
 

   
  
 

root@VM-671:~# squeue
   
  
 

   
  
 
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
   
  
 

   
  
 
50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources)
   
  
 

   
  
 

sinfo gives me :
   
  
 

   
  
 

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
   
  
 

   
  
 
SLURM-de* up infinite 3 down VM-[669-671]
   
  
 

   
  
 

I have already used slurm one time with the same
configuration
and
I
wan able to run my job.
   
  
 

   
  
 
But now the second time I always got :
   
  
 

   
  
 

srun: Required node not available (down or drained)
   
  
 

   
  
 
srun: job 51 queued and waiting for resources
   
  
 

   
  
 

Advance thanks for your help,
   
  
 

   
  
 
Siva
   
  
 

   
  
 
--
   
  
 

Siva sangari NANDY - Plate-forme GenOuest
   
  
 
IRISA-INRIA, Campus de Beaulieu
   
  
 
263 Avenue du Général Leclerc
   
  
 

35042 Rennes cedex, France
   
  
 
Tél: +33 (0) 2 99 84 25 69
   
  
 

Bureau : D152
   
  
 

  --
 

  Siva sangari NANDY - Plate-forme GenOuest
 
  IRISA-INRIA, Campus de Beaulieu
 
  263 Avenue du Général Leclerc
 

  35042 Rennes cedex, France
 
  Tél: +33 (0) 2 99 84 25 69
 

  Bureau : D152
 

-- 

Siva sangari NANDY - Plate-forme GenOuest 
IRISA-INRIA, Campus de Beaulieu 
263 Avenue du Général Leclerc 

35042 Rennes cedex, France 
Tél: +33 (0) 2 99 84 25 69 

Bureau : D152 


[slurm-dev] Re: Required node not available (down or drained)

2013-08-22 Thread Nikita Burtsev
VM-667 where you have slurmctld running is your master,  you don't need the 
agent part on it. As i understand your setup VM-[669-671] are your actual 
nodes, so you need to check if slurmd is running on those 3 and start it if 
needed.   

--  
Nikita Burtsev


On Thursday, August 22, 2013 at 12:02 PM, Sivasangari Nandy wrote:

 that's what i have done yesterday actually :
  
 /etc/init.d/slurm-llnl start
  
 [ ok ] Starting slurm central management daemon: slurmctld.
 /usr/sbin/slurmctld already running.
  
  De: Nikita Burtsev nikita.burt...@gmail.com 
  (mailto:nikita.burt...@gmail.com)
  À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
  Envoyé: Jeudi 22 Août 2013 09:59:52
  Objet: [slurm-dev] Re: Required node not available (down or drained)
   
  Re: [slurm-dev] Re: Required node not available (down or drained)  
  You need to have slurmd running on all nodes that will execute jobs, so you 
  should start it with init script.   
   
  --  
  Nikita Burtsev
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
  On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:
   
   check if the slurmd daemon is running with the command ps -el | grep 
   slurmd.

   Nothing is happened with ps -el ...

   root@VM-667:~# ps -el | grep slurmd

De: Nikita Burtsev nikita.burt...@gmail.com 
(mailto:nikita.burt...@gmail.com)
À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
Envoyé: Mercredi 21 Août 2013 18:58:52
Objet: [slurm-dev] Re: Required node not available (down or drained)
 
Re: [slurm-dev] Re: Required node not available (down or drained)  
slurmctld is the management process and since your have access to 
squeue/sinfo information it is running just fine. You need to check if 
slurmd (which is the agent part) is running on your nodes, i.e. 
VM-[669-671]  
 
--  
Nikita Burtsev
 
 
On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote:
 
 I have tried :  
  
 /etc/init.d/slurm-llnl start
  
 [ ok ] Starting slurm central management daemon: slurmctld.
 /usr/sbin/slurmctld already running.
  
 And :  
  
 scontrol show slurmd
  
 scontrol: error: slurm_slurmd_info: Connection refused
 slurm_load_slurmd_status: Connection refused
  
  
 Hum how to proceed to repair that problem ?
  
  
  De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com)
  À: slurm-dev slurm-dev@schedmd.com 
  (mailto:slurm-dev@schedmd.com)
  Envoyé: Mercredi 21 Août 2013 15:36:53
  Objet: [slurm-dev] Re: Required node not available (down or drained)
   
  Check your slurmd log. It doesn't appear the slurmd is running.
   
  Sivasangari Nandy sivasangari.na...@irisa.fr 
  (mailto:sivasangari.na...@irisa.fr) wrote:
 Hello,  
  
 I'm trying to use Slurm for the first time, and I got a 
 problem with nodes I think.
 I have this message when I used squeue :
  
 root@VM-667:~# squeue
   JOBID PARTITION NAME USER  ST   TIME  NODES 
 NODELIST(REASON)
  50 SLURM-deb  test.sh (http://test.sh) root  PD 
 ;   0:00  1 (ReqNodeNotAvail)
  
 or this one with an other squeue :
  
 root@VM-671:~# squeue
   JOBID PARTITION NAME USER  ST   TIME  NODES 
 NODELIST(REASON)
  50 SLURM-deb  test.sh (http://test.sh) root  PD  
  0:00   n bsp;  1 (Resources)
  
 sinfo gives me :
  
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 SLURM-de*up   infinite  3   down VM-[669-671]
  
  
 I have already used slurm one time with the same 
 configuration and I wan able to run my job.
 But now the second time I always got :  
  
 srun: Required node not available (down or drained)
 srun: job 51 queued and waiting for resources
  
  
 Advance thanks for your help,  
 Siva
  
  
 
 



   
   
  
  
  
  
 --  
 Sivasangari NANDY -  Plate-forme GenOuest
 IRISA-INRIA, Campus de Beaulieu
 263 Avenue du Général Leclerc
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 25 69
 Bureau :  D152
  
 



   --  
   Sivasangari NANDY -  Plate-forme GenOuest
   IRISA-INRIA, Campus de Beaulieu
   263 Avenue du Général Leclerc
   35042 Rennes cedex, France
   Tél: +33 (0) 2 99 84 25 69
   Bureau :  D152

   
  
  
  
 --  
 Sivasangari NANDY -  Plate-forme GenOuest
 IRISA-INRIA, Campus de Beaulieu
 263 Avenue du Général Leclerc
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 25 69
 Bureau :  D152
  



[slurm-dev] Re: Required node not available (down or drained)

2013-08-22 Thread Sivasangari Nandy
So i have done : /etc/init.d/slurm-llnl start 
in each node and tried again but i have : 

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 
50 SLURM-deb test.sh root PD 0:00 1 (Resources) 
53 SLURM-deb test.sh root PD 0:00 1 (Resources) 

and I have this when i try : root@VM-671:~# ps -el | grep slurmd 

5 S 0 8223 1 0 80 0 - 22032 - ? 00:00:01 slurmd 

- Mail original -

 De: Nikita Burtsev nikita.burt...@gmail.com
 À: slurm-dev slurm-dev@schedmd.com
 Envoyé: Jeudi 22 Août 2013 09:59:52
 Objet: [slurm-dev] Re: Required node not available (down or drained)

 Re: [slurm-dev] Re: Required node not available (down or drained)
 You need to have slurmd running on all nodes that will execute jobs,
 so you should start it with init script.

 --
 Nikita Burtsev
 Sent with Sparrow

 On Thursday, August 22, 2013 at 11:55 AM, Sivasangari Nandy wrote:
   check if the slurmd daemon is running with the command  ps -el |
  grep slurmd . 
 

  Nothing is happened with ps -el ...
 

  root@VM-667:~# ps -el | grep slurmd
 

   De: Nikita Burtsev  nikita.burt...@gmail.com 
  
 
   À: slurm-dev  slurm-dev@schedmd.com 
  
 
   Envoyé: Mercredi 21 Août 2013 18:58:52
  
 
   Objet: [slurm-dev] Re: Required node not available (down or
   drained)
  
 

   Re: [slurm-dev] Re: Required node not available (down or drained)
  
 
   slurmctld is the management process and since your have access to
   squeue/sinfo information it is running just fine. You need to
   check
   if slurmd (which is the agent part) is running on your nodes,
   i.e.
   VM-[669-671]
  
 

   --
  
 
   Nikita Burtsev
  
 

   On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy
   wrote:
  
 
I have tried :
   
  
 

/etc/init.d/slurm-llnl start
   
  
 

[ ok ] Starting slurm central management daemon: slurmctld.
   
  
 
/usr/sbin/slurmctld already running.
   
  
 

And :
   
  
 

scontrol show slurmd
   
  
 

scontrol: error: slurm_slurmd_info: Connection refused
   
  
 
slurm_load_slurmd_status: Connection refused
   
  
 

Hum how to proceed to repair that problem ?
   
  
 

 De: Danny Auble  d...@schedmd.com 

   
  
 
 À: slurm-dev  slurm-dev@schedmd.com 

   
  
 
 Envoyé: Mercredi 21 Août 2013 15:36:53

   
  
 
 Objet: [slurm-dev] Re: Required node not available (down or
 drained)

   
  
 

 Check your slurmd log. It doesn't appear the slurmd is
 running.

   
  
 

 Sivasangari Nandy  sivasangari.na...@irisa.fr  wrote:

   
  
 
Hello,
   
  
 

   
  
 

I'm trying to use Slurm for the first time, and I got a
problem
with
nodes I think.
   
  
 

   
  
 
I have this message when I used squeue :
   
  
 

   
  
 

root@VM-667:~# squeue
   
  
 

   
  
 
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
   
  
 

   
  
 
50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail)
   
  
 

   
  
 

or this one with an other squeue :
   
  
 

   
  
 

root@VM-671:~# squeue
   
  
 

   
  
 
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
   
  
 

   
  
 
50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources)
   
  
 

   
  
 

sinfo gives me :
   
  
 

   
  
 

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
   
  
 

   
  
 
SLURM-de* up infinite 3 down VM-[669-671]
   
  
 

   
  
 

I have already used slurm one time with the same
configuration
and
I
wan able to run my job.
   
  
 

   
  
 
But now the second time I always got :
   
  
 

   
  
 

srun: Required node not available (down or drained)
   
  
 

   
  
 
srun: job 51 queued and waiting for resources
   
  
 

   
  
 

Advance thanks for your help,
   
  
 

   
  
 
Siva
   
  
 

   
  
 
--
   
  
 

Siva sangari NANDY - Plate-forme GenOuest
   
  
 
IRISA-INRIA, Campus de Beaulieu
   
  
 
263 Avenue du Général Leclerc
   
  
 

35042 Rennes cedex, France
   
  
 
Tél: +33 (0) 2 99 84 25 69
   
  
 

Bureau : D152
   
  
 

  --
 

  Siva sangari NANDY - Plate-forme GenOuest
 
  IRISA-INRIA, Campus de Beaulieu
 
  263 Avenue du Général Leclerc
 

  35042 Rennes cedex, France
 
  Tél: +33 (0) 2 99 84 25 69
 

  Bureau : D152
 

-- 

Siva sangari NANDY - Plate-forme GenOuest 
IRISA-INRIA, Campus de Beaulieu 
263 Avenue du Général Leclerc 

35042 Rennes cedex, France 
Tél: +33 (0) 2 99 84 25 69 

Bureau : D152 


[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Sivasangari Nandy
I have tried : 

/etc/init.d/slurm-llnl start 

[ ok ] Starting slurm central management daemon: slurmctld. 
/usr/sbin/slurmctld already running. 

And : 

scontrol show slurmd 

scontrol: error: slurm_slurmd_info: Connection refused 
slurm_load_slurmd_status: Connection refused 

Hum how to proceed to repair that problem ? 

- Mail original -

 De: Danny Auble d...@schedmd.com
 À: slurm-dev slurm-dev@schedmd.com
 Envoyé: Mercredi 21 Août 2013 15:36:53
 Objet: [slurm-dev] Re: Required node not available (down or drained)

 Check your slurmd log. It doesn't appear the slurmd is running.

 Sivasangari Nandy  sivasangari.na...@irisa.fr  wrote:
Hello,
   
  
 

I'm trying to use Slurm for the first time, and I got a problem
with
nodes I think.
   
  
 
I have this message when I used squeue :
   
  
 

root@VM-667:~# squeue
   
  
 
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
   
  
 
50 SLURM-deb test.sh root PD ; 0:00 1 (ReqNodeNotAvail)
   
  
 

or this one with an other squeue :
   
  
 

root@VM-671:~# squeue
   
  
 
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
   
  
 
50 SLURM-deb test.sh root PD 0:00 n bsp; 1 (Resources)
   
  
 

sinfo gives me :
   
  
 

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
   
  
 
SLURM-de* up infinite 3 down VM-[669-671]
   
  
 

I have already used slurm one time with the same configuration
and
I
wan able to run my job.
   
  
 
But now the second time I always got :
   
  
 

srun: Required node not available (down or drained)
   
  
 
srun: job 51 queued and waiting for resources
   
  
 

Advance thanks for your help,
   
  
 
Siva
   
  
 
-- 

Siva sangari NANDY - Plate-forme GenOuest 
IRISA-INRIA, Campus de Beaulieu 
263 Avenue du Général Leclerc 

35042 Rennes cedex, France 
Tél: +33 (0) 2 99 84 25 69 

Bureau : D152 


[slurm-dev] Re: Required node not available (down or drained)

2013-08-21 Thread Nikita Burtsev
slurmctld is the management process and since your have access to squeue/sinfo 
information it is running just fine. You need to check if slurmd (which is the 
agent part) is running on your nodes, i.e. VM-[669-671]  

--  
Nikita Burtsev


On Wednesday, August 21, 2013 at 8:13 PM, Sivasangari Nandy wrote:

 I have tried :  
  
 /etc/init.d/slurm-llnl start
  
 [ ok ] Starting slurm central management daemon: slurmctld.
 /usr/sbin/slurmctld already running.
  
 And :  
  
 scontrol show slurmd
  
 scontrol: error: slurm_slurmd_info: Connection refused
 slurm_load_slurmd_status: Connection refused
  
  
 Hum how to proceed to repair that problem ?
  
  
  De: Danny Auble d...@schedmd.com (mailto:d...@schedmd.com)
  À: slurm-dev slurm-dev@schedmd.com (mailto:slurm-dev@schedmd.com)
  Envoyé: Mercredi 21 Août 2013 15:36:53
  Objet: [slurm-dev] Re: Required node not available (down or drained)
   
  Check your slurmd log. It doesn't appear the slurmd is running.
   
  Sivasangari Nandy sivasangari.na...@irisa.fr 
  (mailto:sivasangari.na...@irisa.fr) wrote:
 Hello,  
  
 I'm trying to use Slurm for the first time, and I got a problem with 
 nodes I think.
 I have this message when I used squeue :
  
 root@VM-667:~# squeue
   JOBID PARTITION NAME USER  ST   TIME  NODES 
 NODELIST(REASON)
  50 SLURM-deb  test.sh (http://test.sh) root  PD ;   0:00 
  1 (ReqNodeNotAvail)
  
 or this one with an other squeue :
  
 root@VM-671:~# squeue
   JOBID PARTITION NAME USER  ST   TIME  NODES 
 NODELIST(REASON)
  50 SLURM-deb  test.sh (http://test.sh) root  PD   0:00   
 n bsp;  1 (Resources)
  
 sinfo gives me :
  
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 SLURM-de*up   infinite  3   down VM-[669-671]
  
  
 I have already used slurm one time with the same configuration and I 
 wan able to run my job.
 But now the second time I always got :  
  
 srun: Required node not available (down or drained)
 srun: job 51 queued and waiting for resources
  
  
 Advance thanks for your help,  
 Siva
  
  
 
 



   
   
  
  
  
  
 --  
 Sivasangari NANDY -  Plate-forme GenOuest
 IRISA-INRIA, Campus de Beaulieu
 263 Avenue du Général Leclerc
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 25 69
 Bureau :  D152