[slurm-dev] Re: Job being canceled due to time limits
Hi Ryan, I completely missed that second -t. I'll remove that immediately. Hopefully that fixes it. Thanks! On Thu, Sep 5, 2013 at 3:22 PM, Ryan Cox ryan_...@byu.edu wrote: -t and --time are synonymous. You're using both Ryan On 09/05/2013 12:38 PM, Matthew Russell wrote: Hi, I can't figure out why my job is being canceled due to time limites. My queue has an infinite time limit, and my batch file requests several hours, yet the job is always canceled within a few minutes. gm1@dena:GEM-MACH_1.5.1_dev$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq up infinite 4 idle dena[1-4] *headnode up infinite 1 idle dena* matt up infinite 2 idle dena[1-2] My batch file: #!/home/gm1/ECssm/multi/bin/s.sge_dummy_shell #SBATCH -D /home/gm1 #SBATCH --export=NONE #SBATCH -o /home/gm1/listings/dena/gm338_21388_M.21737.out.o #SBATCH -e /home/gm1/listings/dena/gm338_21388_M.21737.out.e #SBATCH -J gm338_21388_M.30296 *#SBATCH --time=38380* #SBATCH --partition=headnode #SBATCH #SBATCH -c 1 #SBATCH -t 4 #SBATCH # The error log: gm1@dena:GEM-MACH_1.5.1_dev$ cat ~/listings/dena/gm338_21388_M.21737.out.e slurmd[dena]: *** JOB 1683 CANCELLED AT 2013-09-05T14:24:27 DUE TO TIME LIMIT *** Is there somewhere else where a time limit can be imposed? The time limit is being imposed about 5 minutes into the job. Thanks -- Ryan Cox Operations Director Fulton Supercomputing Lab Brigham Young University
[slurm-dev] Job being canceled due to time limits
Hi, I can't figure out why my job is being canceled due to time limites. My queue has an infinite time limit, and my batch file requests several hours, yet the job is always canceled within a few minutes. gm1@dena:GEM-MACH_1.5.1_dev$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq up infinite 4 idle dena[1-4] *headnode up infinite 1 idle dena* matt up infinite 2 idle dena[1-2] My batch file: #!/home/gm1/ECssm/multi/bin/s.sge_dummy_shell #SBATCH -D /home/gm1 #SBATCH --export=NONE #SBATCH -o /home/gm1/listings/dena/gm338_21388_M.21737.out.o #SBATCH -e /home/gm1/listings/dena/gm338_21388_M.21737.out.e #SBATCH -J gm338_21388_M.30296 *#SBATCH --time=38380* #SBATCH --partition=headnode #SBATCH #SBATCH -c 1 #SBATCH -t 4 #SBATCH # The error log: gm1@dena:GEM-MACH_1.5.1_dev$ cat ~/listings/dena/gm338_21388_M.21737.out.e slurmd[dena]: *** JOB 1683 CANCELLED AT 2013-09-05T14:24:27 DUE TO TIME LIMIT *** Is there somewhere else where a time limit can be imposed? The time limit is being imposed about 5 minutes into the job. Thanks
[slurm-dev] Fwd: Slurm Question
Hi, I upgraded slurm on my Bright 5.2 Cluster from 2.2.7 o 2.4.2. Recently I've been having issue running slurm processes though. I've read some postings ( like thishttps://groups.google.com/forum/#!searchin/slurm-devel/execve$20permission$20denied/slurm-devel/Bl0F9TBDPbw/-YzSm_nfo5MJ) but I still couldn't get it working. My slurm job is: gm1@dena:~$ cat slurm_batch_test.sh #!/home/gm1/ #SBATCH -D /home/gm1 #SBATCH --export=NONE #SBATCH -o /home/gm1/test_001.10470.out.o #SBATCH -e /home/gm1/test_001.10470.out.e #SBATCH -J test_001.10470 #SBATCH --time=3600 #SBATCH --partition=matt #SBATCH #SBATCH -c 4 #SBATCH -t 4 #SBATCH #SBATCH module load slurm mpirun hello.sh I run it with, and get the results: gm1@dena:~$ rm test_001.10470.out.*; sbatch slurm_batch_test.sh; sleep 1; cat test* Submitted batch job 1577 slurmd[dena1]: execve(): /cm/local/apps/slurm/2.4.2/spool/job01577/slurm_script: Permission denied In my log file, I get: execve(): /cm/local/apps/slurm/2.4.2/spool/job01576/slurm_script: Permission denied My script is executable: gm1@dena:~$ ls hello.sh -rwxr-xr-x 1 gm1 gm 37 Aug 21 11:55 hello.sh slurmd is being run by root, root 18462 0.0 0.0 159384 1868 ?S14:24 0:00 /cm/shared/apps/slurm/current/sbin/slurmd I think it's running in /var/run/slurm [root@dena1 2.4.2]# cat etc/slurm.conf |grep run SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmdPidFile=/var/run/slurm/slurmd.pid Which is owned by slurm, [root@dena1 2.4.2]# ls /var/run/ | grep slurm drwxr-xr-x 2 slurm slurm 4.0K Aug 21 14:24 slurm And the ,pid file is owed by root, [root@dena1 2.4.2]# ls /var/run/slurm/ total 4.0K -rw-r--r-- 1 root root 6 Aug 21 14:24 slurmd.pid I'm not sure how to continue. Can anyone help? Thanks.