[slurm-users] Running two multiprocessing jobs in one sbatch

2020-07-25 Thread Даниил Вахрамеев
Hi everyone!

I have SLURM cluster with several nodes with 16 vcpus per node. I've tried
to run the following code:

#SBATCH --nodes 2
#SBATCH --ntasks 2
#SBATCH -c 16

srun --exclusive --nodes=1 program1 &
srun --exclusive --nodes=1 program2 &
wait

program1 and program2 needs 16cpus each and I expected that 2 nodes with 32
cores would be allocated and program1 would be ran on the first node and
program2 on the second one, but I got the following error message:

srun: error: Unable to create step for job 364966: Requested node
configuration is not available

If I use only --nodes and --ntasks keys, sbatch allocates 2 nodes with 2
cpus and if I use --nodes and -c options, I get message that --ntasks
should be defined.

If I set --ntasks=1, SLURM set nnodes to 1.

How can I run this two programs in one batch, each on one node and 16 vcpus?

--

Kind regards,

Daniil Vakhrameev


Re: [slurm-users] Running two multiprocessing jobs in one sbatch

2020-07-25 Thread Brian Andrus

Is there a reason to run them as a single job?

It may be easier to just have 2 separate jobs of 16 cores each.

If there are dependency requirements, that is addressed by adding any 
dependencies to the job submission.


Brian Andrus

On 7/25/2020 2:50 AM, Даниил Вахрамеев wrote:

Hi everyone!

I have SLURM cluster with several nodes with 16 vcpus per node. I've 
tried to run the following code:


|#SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH -c 16 srun --exclusive 
--nodes=1 program1 & srun --exclusive --nodes=1 program2 & wait |


|program1| and |program2| needs 16cpus each and I expected that 2 
nodes with 32 cores would be allocated and |program1| would be ran on 
the first node and |program2| on the second one, but I got the 
following error message:


|srun: error: Unable to create step for job 364966: Requested node 
configuration is not available |


If I use only |--nodes| and |--ntasks| keys, sbatch allocates 2 nodes 
with 2 cpus and if I use |--nodes| and |-c| options, I get message 
that |--ntasks| should be defined.


If I set |--ntasks=1|, SLURM set nnodes to 1.

How can I run this two programs in one batch, each on one node and 16 
vcpus?


--

Kind regards,

Daniil Vakhrameev