[slurm-dev] Re: Job being canceled due to time limits

2013-09-05 Thread Matthew Russell
Hi Ryan,

I completely missed that second -t.  I'll remove that immediately.
 Hopefully that fixes it.  Thanks!


On Thu, Sep 5, 2013 at 3:22 PM, Ryan Cox ryan_...@byu.edu wrote:

  -t and --time are synonymous.  You're using both

 Ryan


 On 09/05/2013 12:38 PM, Matthew Russell wrote:

 Hi,

  I can't figure out why my job is being canceled due to time limites.  My
 queue has an infinite time limit, and my batch file requests several hours,
 yet the job is always canceled within a few minutes.

  gm1@dena:GEM-MACH_1.5.1_dev$ sinfo
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  defq up   infinite  4   idle dena[1-4]
 *headnode up   infinite  1   idle dena*
 matt up   infinite  2   idle dena[1-2]

  My batch file:
  #!/home/gm1/ECssm/multi/bin/s.sge_dummy_shell
 #SBATCH -D /home/gm1
 #SBATCH --export=NONE
 #SBATCH -o /home/gm1/listings/dena/gm338_21388_M.21737.out.o
 #SBATCH -e /home/gm1/listings/dena/gm338_21388_M.21737.out.e
 #SBATCH -J gm338_21388_M.30296
 *#SBATCH --time=38380*
 #SBATCH --partition=headnode
 #SBATCH
 #SBATCH -c 1
 #SBATCH -t 4
 #SBATCH
 #
  


  The error log:
  gm1@dena:GEM-MACH_1.5.1_dev$ cat
 ~/listings/dena/gm338_21388_M.21737.out.e
 slurmd[dena]: *** JOB 1683 CANCELLED AT 2013-09-05T14:24:27 DUE TO TIME
 LIMIT ***


  Is there somewhere else where a time limit can be imposed?  The time
 limit is being imposed about 5 minutes into the job.

  Thanks


 --
 Ryan Cox
 Operations Director
 Fulton Supercomputing Lab
 Brigham Young University




[slurm-dev] Job being canceled due to time limits

2013-09-05 Thread Matthew Russell
Hi,

I can't figure out why my job is being canceled due to time limites.  My
queue has an infinite time limit, and my batch file requests several hours,
yet the job is always canceled within a few minutes.

gm1@dena:GEM-MACH_1.5.1_dev$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq up   infinite  4   idle dena[1-4]
*headnode up   infinite  1   idle dena*
matt up   infinite  2   idle dena[1-2]

My batch file:
#!/home/gm1/ECssm/multi/bin/s.sge_dummy_shell
#SBATCH -D /home/gm1
#SBATCH --export=NONE
#SBATCH -o /home/gm1/listings/dena/gm338_21388_M.21737.out.o
#SBATCH -e /home/gm1/listings/dena/gm338_21388_M.21737.out.e
#SBATCH -J gm338_21388_M.30296
*#SBATCH --time=38380*
#SBATCH --partition=headnode
#SBATCH
#SBATCH -c 1
#SBATCH -t 4
#SBATCH
#



The error log:
gm1@dena:GEM-MACH_1.5.1_dev$ cat ~/listings/dena/gm338_21388_M.21737.out.e
slurmd[dena]: *** JOB 1683 CANCELLED AT 2013-09-05T14:24:27 DUE TO TIME
LIMIT ***


Is there somewhere else where a time limit can be imposed?  The time limit
is being imposed about 5 minutes into the job.

Thanks


[slurm-dev] Fwd: Slurm Question

2013-08-21 Thread Matthew Russell
Hi,

I upgraded slurm on my Bright 5.2 Cluster from 2.2.7 o 2.4.2.  Recently
I've been having issue running slurm processes though.  I've read some
postings ( like
thishttps://groups.google.com/forum/#!searchin/slurm-devel/execve$20permission$20denied/slurm-devel/Bl0F9TBDPbw/-YzSm_nfo5MJ)
but I still couldn't get it working.

My slurm job is:
gm1@dena:~$ cat slurm_batch_test.sh
#!/home/gm1/
#SBATCH -D /home/gm1
#SBATCH --export=NONE
#SBATCH -o /home/gm1/test_001.10470.out.o
#SBATCH -e /home/gm1/test_001.10470.out.e
#SBATCH -J test_001.10470
#SBATCH --time=3600
#SBATCH --partition=matt
#SBATCH
#SBATCH -c 4
#SBATCH -t 4
#SBATCH
#SBATCH

module load slurm
mpirun hello.sh


I run it with, and get the results:
gm1@dena:~$ rm test_001.10470.out.*; sbatch slurm_batch_test.sh; sleep 1;
cat test*
Submitted batch job 1577
slurmd[dena1]: execve():
/cm/local/apps/slurm/2.4.2/spool/job01577/slurm_script: Permission denied

In my log file, I get: execve():
/cm/local/apps/slurm/2.4.2/spool/job01576/slurm_script: Permission denied

My script is executable:
gm1@dena:~$ ls hello.sh
-rwxr-xr-x 1 gm1 gm 37 Aug 21 11:55 hello.sh

slurmd is being run by root,
root 18462  0.0  0.0 159384  1868 ?S14:24   0:00
/cm/shared/apps/slurm/current/sbin/slurmd

I think it's running in /var/run/slurm
[root@dena1 2.4.2]# cat etc/slurm.conf |grep run
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid

Which is owned by slurm,
[root@dena1 2.4.2]# ls /var/run/ | grep slurm
drwxr-xr-x 2 slurm   slurm   4.0K Aug 21 14:24 slurm

And the ,pid file is owed by root,
[root@dena1 2.4.2]# ls /var/run/slurm/
total 4.0K
-rw-r--r-- 1 root root 6 Aug 21 14:24 slurmd.pid

I'm not sure how to continue.

Can anyone help?  Thanks.