Hi Lars, On Thu, Mar 19, 2020 at 03:16:15PM +0100, Lars Veldscholte wrote: > A simple test like `srun hostname` works, even on multiple cores. However, > when trying to use MPI, it crashes with the following error message: > > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > > This happens even in the most simple "Hello World" case, as long as the > program is MPI-enabled. > > I am trying to use OpenMPI (4.0.2) from the Debian repositories. `srun --mpi > list` returns: > > srun: MPI types are... > srun: openmpi > srun: pmi2 > srun: none > > I have tried all options, but the result is the same in all cases. > > Maybe this is user error, as this is my first time setting up SLURM, but I > have not been able to find any possible causes/solutions and I am kind of > stuck at this point.
I don't know why srun doesn't execute openmpi directly, and I'll try to investigate this issue but as a workaround you can use both sbatch and salloc as in [1]: salloc -n 4 mpirun mympiprogram ... or sbatch -n 4 mympiprogram.sh where mympiprogram.sh is something like: #!/bin/sh mpirun mympiprogram ... Notice you don't need to specify the number of processes to mpirun, as it takes it from SLURM. [1] https://www.open-mpi.org/faq/?category=slurm Best regards, -- Gennaro Oliva