[OMPI devel] Slurm integration and rankfiles....
Hi all, Using a rather trivial example mpirun -np 1 -rf rankfile ./HelloWorld on a Slurm system; -- While trying to determine what resources are available, the SLURM resource allocator expects to find the following environment variables: SLURM_NODELIST SLURM_TASKS_PER_NODE However, it was unable to find the following environment variable: SLURM_TASKS_PER_NODE -- (Both for OpenMPI 4.0/4.1). It is correct the variable is not set, but why is SLURM_TASKS_PER_NODE expected or required when using a rankfile where one presumes it would not be a constant across the job anyway? Martyn
Re: [OMPI devel] Slurm integration and rankfiles....
Hi Ralph, Slurm is 19.05. To be clear - its not unexpected that SLURM_TASKS_PER_NODE is unset in the configuration. Martyn On Thu, 11 Mar 2021 at 16:09, Ralph Castain via devel < devel@lists.open-mpi.org> wrote: > What version of Slurm is this? > > > On Mar 11, 2021, at 8:03 AM, Martyn Foster via devel < > devel@lists.open-mpi.org> wrote: > > > > Hi all, > > > > Using a rather trivial example > > mpirun -np 1 -rf rankfile ./HelloWorld > > on a Slurm system; > > > -- > > While trying to determine what resources are available, the SLURM > > resource allocator expects to find the following environment variables: > > > > SLURM_NODELIST > > SLURM_TASKS_PER_NODE > > > > However, it was unable to find the following environment variable: > > > > SLURM_TASKS_PER_NODE > > > > > -- > > > > (Both for OpenMPI 4.0/4.1). It is correct the variable is not set, but > why is SLURM_TASKS_PER_NODE expected or required when using a rankfile > where one presumes it would not be a constant across the job anyway? > > > > Martyn > > > > >
Re: [OMPI devel] Slurm integration and rankfiles....
Sorry for the slow reply! I didn't want to get fixated on why the variable was unset, though I can understand the existence of a check if Slurm always sets this (I don't recall that being the case for all configurations historically, but perhaps it is now). The reason I'd unset it (!) is because I was trying to build an environment to support completely arbitrary task placement/distribution that works with various launchers (orterun/srun/hydra) and it seems tasks_per_node being set was upsetting one of the others. Slurm's internal geometry parameters can't possibly describe an arbitrary (rankfile) layout, so I was nervous about why they would be required if a rankfile was provided... Martyn On Mon, 15 Mar 2021 at 19:57, Ralph Castain via devel < devel@lists.open-mpi.org> wrote: > Martyn? Why are you saying SLURM_TASKS_PER_NODE might not be present? > > It sounds to me like something is wrong in your Slurm environment - I > really believe that this envar is always supposed to be there. > > > > On Mar 15, 2021, at 4:20 AM, Peter Kjellström wrote: > > > > On Fri, 12 Mar 2021 22:19:09 + > > Ralph Castain via devel wrote: > > > >> Why would it not be set? AFAICT, Slurm is supposed to always set that > >> envar, or so we've been told. > > > > Maybe confusion on the exact name? > > > > AFAIK slurm always sets SLURM_TASKS_PER_NODE but only sets > > SLURM_NTASKS_PER_NODE (almost same name) when --ntasks-per-node is > > given. > > > > /Peter K > > >