Hi all,

I am attempting to run PyFR on a GPU cluster with Slurm. The program runs 
fine serially, but I am getting issues with running in parallel on the 
cluster while my local version runs well. The issue I am getting is 

RuntimeError: Mesh has 4 partitions but running with 1 MPI ranks


which I am assuming is from the MPI not initializing properly. I have tried 
first with my build of MVAPICH2, then with the cluster's OpenMPI module 
(intel-mpi/gcc/2018.1/64) with the suggestion of the cluster admins. I have 
rebuilt mpi4py in between changing from MVAPICH2 to OpenMPI. My Slurm 
script is as follows:

#!/bin/bash
#SBATCH -J t106_RR
#SBATCH -t 00:05:00
#SBATCH -N 1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --gres=gpu:4


ulimit -s unlimited
source /home/tdzanic/.bashrc
srun  pyfr run -b cuda /home/tdzanic/TestCases/Turbine/mesh_t106a.pyfrm /
home/tdzanic/TestCases/Turbine/t106_3D_baseline.ini -p


I have tried also with mpiexec with the same error. I am using Anaconda, 
could that be an issue? Also, in my move from MVAPICH2 to OpenMPI, are 
there any other packages that I need to rebuild besides mpi4py? 

Thanks,
Tarik

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pyfrmailinglist+unsubscr...@googlegroups.com.
To post to this group, send an email to pyfrmailinglist@googlegroups.com.
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

Reply via email to