Re: [slurm-users] RES: multiple srun commands in the same SLURM script

2023-11-01 Thread Bjørn-Helge Mevik
Paulo Jose Braga Estrela  writes:

> Hi,
>
> I think that you have a syntax error in your bash script. The "&"
> means that you want to send a process to background not that you want
> to run many commands in parallel. To run commands in a serial fashion
> you should use cmd && cmd2, then the cmd2 will only be executed if the
> command 1 return 0 as exit code.
>
> To run commands in parallel with srun you should set the number of
> tasks to 4, so srun will spawn 4 tasks of the same command. Take a
> look at the examples section in srun
> docs. (https://slurm.schedmd.com/srun.html)

Well, if you look at Example 7 in that section:

Example 7:
This example shows a script in which Slurm is used to provide resource 
management for a job by executing the various job steps as processors become 
available for their dedicated use. 

$ cat my.script
#!/bin/bash
srun -n4 prog1 &
srun -n3 prog2 &
srun -n1 prog3 &
srun -n1 prog4 &
wait

which is what OP tries to do.  It is mainly for running *different*
programs in parallel inside a job.  If one wants to run *the same*
program in parallel, then a single srun is indeed the recommended way.

I think the main problem is that the original job script only asks for a
single CPU, so the sruns will only run one at a time.  Try adding
--ntasks-per-node=4 or similar.

Note that exactly how to run different programs in parallel with srun
has changed quite a bit in the recent versions, and the example above is
for the latest version, so check the srun man page for your version.
(And unfortunately, the documentation in the srun man page has not
always been correct, so you might need to experiment.  For instance, I
believe Example 7 above is missing `--exact` or `SLURM_EXACT`. :) )

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo



signature.asc
Description: PGP signature


[slurm-users] RES: multiple srun commands in the same SLURM script

2023-10-31 Thread Paulo Jose Braga Estrela
Hi,

I think that you have a syntax error in your bash script. The "&" means that 
you want to send a process to background not that you want to run many commands 
in parallel. To run commands in a serial fashion you should use cmd && cmd2, 
then the cmd2 will only be executed if the command 1 return 0 as exit code.

To run commands in parallel with srun you should set the number of tasks to 4, 
so srun will spawn 4 tasks of the same command. Take a look at the examples 
section in srun docs. (https://slurm.schedmd.com/srun.html)




PÚBLICA
-Mensagem original-
De: slurm-users  Em nome de Andrei 
Berceanu
Enviada em: terça-feira, 31 de outubro de 2023 07:51
Para: slurm-users@lists.schedmd.com
Assunto: [slurm-users] multiple srun commands in the same SLURM script

Here is my SLURM script:

#!/bin/bash

#SBATCH --job-name="gpu_test"
#SBATCH --output=gpu_test_%j.log   # Standard output and error log
#SBATCH --account=berceanu_a+

#SBATCH --partition=gpu
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=31200m   # Reserve 32 GB of RAM per core
#SBATCH --time=12:00:00# Max allowed job runtime
#SBATCH --gres=gpu:16   # Allocate four GPUs

export SLURM_EXACT=1

srun --mpi=pmi2 -n 1 --gpus-per-node 1 python gpu_test.py & srun --mpi=pmi2 -n 
1 --gpus-per-node 1 python gpu_test.py & srun --mpi=pmi2 -n 1 --gpus-per-node 1 
python gpu_test.py & srun --mpi=pmi2 -n 1 --gpus-per-node 1 python gpu_test.py &

wait

What I expect this to do is to run, in parallel, 4 independent copies of the 
gpu_test.py python script, using 4 out of the 16 GPUs on this node.

What it actually does is it only runs the script on a single GPU - it's as if 
the other 3 srun commands do nothing. Perhaps they do not see any available 
GPUs for some reason?

System info:

slurm 19.05.2

Linux 5.4.0-90-generic #101~18.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gpu  up   infinite  1   idle thor

NodeName=thor Arch=x86_64 CoresPerSocket=24
   CPUAlloc=0 CPUTot=48 CPULoad=0.45
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:16(S:0-1)
   NodeAddr=thor NodeHostName=thor
   OS=Linux 5.4.0-90-generic #101~18.04.1-Ubuntu SMP Fri Oct 22
09:25:04 UTC 2021
   RealMemory=1546812 AllocMem=0 FreeMem=1433049 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu
   BootTime=2023-08-09T14:58:01 SlurmdStartTime=2023-08-09T14:58:36
   CfgTRES=cpu=48,mem=1546812M,billing=48,gres/gpu=16
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

I can add any additional system info as required.

Thank you so much for taking the time to read this,

Regards,
Andrei

O emitente desta mensagem é responsável por seu conteúdo e endereçamento e deve 
observar as normas internas da Petrobras. Cabe ao destinatário assegurar que as 
informações e dados pessoais contidos neste correio eletrônico somente sejam 
utilizados com o grau de sigilo adequado e em conformidade com a legislação de 
proteção de dados e privacidade aplicável. A utilização das informações e dados 
pessoais contidos neste correio eletrônico em desconformidade com as normas 
aplicáveis acarretará a aplicação das sanções cabíveis.

The sender of this message is responsible for its content and address and must 
comply with Petrobras' internal rules. It is up to the recipient to ensure that 
the information and personal data contained in this email are only used with 
the appropriate degree of confidentiality and in compliance with applicable 
data protection and privacy legislation. The use of the information and 
personal data contained in this e-mail in violation of the applicable rules 
will result in the application of the applicable sanctions.

El remitente de este mensaje es responsable por su contenido y dirección y debe 
cumplir con las normas internas de Petrobras. Corresponde al destinatario 
asegurarse de que la información y los datos personales contenidos en este 
correo electrónico solo se utilicen con el grado adecuado de confidencialidad y 
de conformidad con la legislación aplicable en materia de privacidad y 
protección de datos. El uso de la información y datos personales contenidos en 
este correo electrónico en contravención de las normas aplicables dará lugar a 
la aplicación de las sanciones correspondientes.