Re: [slurm-users] Running two multiprocessing jobs in one sbatch

2020-07-25 Thread Brian Andrus

Is there a reason to run them as a single job?

It may be easier to just have 2 separate jobs of 16 cores each.

If there are dependency requirements, that is addressed by adding any 
dependencies to the job submission.


Brian Andrus

On 7/25/2020 2:50 AM, Даниил Вахрамеев wrote:

Hi everyone!

I have SLURM cluster with several nodes with 16 vcpus per node. I've 
tried to run the following code:


|#SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH -c 16 srun --exclusive 
--nodes=1 program1 & srun --exclusive --nodes=1 program2 & wait |


|program1| and |program2| needs 16cpus each and I expected that 2 
nodes with 32 cores would be allocated and |program1| would be ran on 
the first node and |program2| on the second one, but I got the 
following error message:


|srun: error: Unable to create step for job 364966: Requested node 
configuration is not available |


If I use only |--nodes| and |--ntasks| keys, sbatch allocates 2 nodes 
with 2 cpus and if I use |--nodes| and |-c| options, I get message 
that |--ntasks| should be defined.


If I set |--ntasks=1|, SLURM set nnodes to 1.

How can I run this two programs in one batch, each on one node and 16 
vcpus?


--

Kind regards,

Daniil Vakhrameev




[slurm-users] Running two multiprocessing jobs in one sbatch

2020-07-25 Thread Даниил Вахрамеев
Hi everyone!

I have SLURM cluster with several nodes with 16 vcpus per node. I've tried
to run the following code:

#SBATCH --nodes 2
#SBATCH --ntasks 2
#SBATCH -c 16

srun --exclusive --nodes=1 program1 &
srun --exclusive --nodes=1 program2 &
wait

program1 and program2 needs 16cpus each and I expected that 2 nodes with 32
cores would be allocated and program1 would be ran on the first node and
program2 on the second one, but I got the following error message:

srun: error: Unable to create step for job 364966: Requested node
configuration is not available

If I use only --nodes and --ntasks keys, sbatch allocates 2 nodes with 2
cpus and if I use --nodes and -c options, I get message that --ntasks
should be defined.

If I set --ntasks=1, SLURM set nnodes to 1.

How can I run this two programs in one batch, each on one node and 16 vcpus?

--

Kind regards,

Daniil Vakhrameev


Re: [slurm-users] cgroup limits not created for jobs

2020-07-25 Thread Chris Samuel
On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote:

> But when I run a job on the node it runs I can find no
> evidence in cgroups of any limits being set
> 
> Example job:
> 
> mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
> salloc: Granted job allocation 17
> mlscgpu1[0]:~$ echo $$
> 137112
> mlscgpu1[0]:~$

You're not actually running inside a job at that point unless you've defined 
"SallocDefaultCommand" in your slurm.conf, and I'm guessing that's not the 
case there.  You can make salloc fire up an srun for you in the allocation 
using that option, see the docs here:

https://slurm.schedmd.com/slurm.conf.html#OPT_SallocDefaultCommand

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA