This is a follow-up of
https://mail-archive.com/users@lists.open-mpi.org/msg30055.html
Thanks Matias for the lengthy explanation.
currently, PSM2_DEVICES is overwritten, so i do not think setting it
before invoking mpirun will help
also, in this specific case
- the user is running within a SLURM allocation with 2 nodes
- the user specified a host file with 2 distinct nodes
my first impression is that mtl/psm2 could/should handle this (well only
one condition has to be met) properly and *not* set
export PSM2_DEVICES="self,shm"
the patch below
- does not overwrite PSM2_DEVICES
- does not set PSM2_DEVICES when num_max_procs > num_total_procs
this is suboptimal, but i could not find a way to get the number of orted.
iirc, MPI_Comm_spawn can have an orted dynamically spawned by passing a
host in the MPI_Info.
if this host is not part of the hostfile (nor RM allocation ?), then
PSM2_DEVICES must be set manually by the user
Ralph,
is there a way to get the number of orted ?
- if i mpirun -np 1 --host n0,n1 ... orte_process_info.num_nodes is 1 (i
wish i could get 2)
- if running in singleton mode, orte_process_info.num_max_procs is 0 (is
this a bug or a feature ?)
Cheers,
Gilles
diff --git a/ompi/mca/mtl/psm2/mtl_psm2_component.c
b/ompi/mca/mtl/psm2/mtl_psm2_component.c
index 26bccd2..52b906b 100644
--- a/ompi/mca/mtl/psm2/mtl_psm2_component.c
+++ b/ompi/mca/mtl/psm2/mtl_psm2_component.c
@@ -14,6 +14,8 @@
* Copyright (c) 2012-2015 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2013-2016 Intel, Inc. All rights reserved
+ * Copyright (c) 2016 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -170,6 +172,13 @@ get_num_total_procs(int *out_ntp)
}
static int
+get_num_max_procs(int *out_nmp)
+{
+ *out_nmp = (int)ompi_process_info.max_procs;
+ return OMPI_SUCCESS;
+}
+
+static int
get_num_local_procs(int *out_nlp)
{
/* num_local_peers does not include us in
@@ -201,7 +210,7 @@ ompi_mtl_psm2_component_init(bool
enable_progress_threads,
int verno_major = PSM2_VERNO_MAJOR;
int verno_minor = PSM2_VERNO_MINOR;
int local_rank = -1, num_local_procs = 0;
- int num_total_procs = 0;
+ int num_total_procs = 0, num_max_procs = 0;
/* Compute the total number of processes on this host and our
local rank
* on that node. We need to provide PSM2 with these values so it can
@@ -221,6 +230,11 @@ ompi_mtl_psm2_component_init(bool
enable_progress_threads,
"Cannot continue.\n");
return NULL;
}
+ if (OMPI_SUCCESS != get_num_max_procs(&num_max_procs)) {
+ opal_output(0, "Cannot determine max number of processes. "
+ "Cannot continue.\n");
+ return NULL;
+ }
err = psm2_error_register_handler(NULL /* no ep */,
PSM2_ERRHANDLER_NOP);
@@ -230,8 +244,10 @@ ompi_mtl_psm2_component_init(bool
enable_progress_threads,
return NULL;
}
- if (num_local_procs == num_total_procs) {
- setenv("PSM2_DEVICES", "self,shm", 0);
+ if ((num_local_procs == num_total_procs) && (num_max_procs <=
num_total_procs)) {
+ if (NULL == getenv("PSM2_DEVICES")) {
+ setenv("PSM2_DEVICES", "self,shm", 0);
+ }
}
err = psm2_init(&verno_major, &verno_minor);
On 9/30/2016 12:38 AM, Cabral, Matias A wrote:
Hi Giles et.al.,
You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic
process support was/is not working in OMPI when using PSM2 because of
an issue related to the transport keys. This was fixed in PR #1602
(https://github.com/open-mpi/ompi/pull/1602) and should be included in
v2.0.2. HOWEVER, this not the error Juraj is seeing. The root of the
assertion is because the PSM/PSM2 MTLs will check for where the
“original” process are running and, if detects all are local to the
node, it will ONLY initialize the shared memory device (variable
PSM2_DEVICES="self,shm” ). This is to avoid “reserving” HW resources
in the HFI card that wouldn’t be used unless you later on spawn ranks
in other nodes. Therefore, to allow dynamic process to be spawned on
other nodes you need to tell PSM2 to instruct the HW to initialize all
the de devices by making the environment variable
PSM2_DEVICES="self,shm,hfi" available before running the job.
Note that setting PSM2_DEVICES (*) will solve the below assertion, you
will most likely still see the transport key issue if PR1602 if is not
included.
Thanks,
_MAC
(*)
PSM2_DEVICES -> Omni Path
PSM_DEVICES -> TrueScale
*From:*users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of
*r...@open-mpi.org
*Sent:* Thursday, September 29, 2016 7:12 AM
*To:* Open MPI Users <us...@lists.open-mpi.org>
*Subject:* Re: [OMPI users] MPI_Comm_spawn
Ah, that may be why it wouldn’t show up in the OMPI code base itself.
If that is the case here, then no - OMPI v2.0.1 does not support
comm_spawn for PSM. It is fixed in the upcoming 2.0.2
On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com
<mailto:gilles.gouaillar...@gmail.com>> wrote:
Ralph,
My guess is that ptl.c comes from PSM lib ...
Cheers,
Gilles
On Thursday, September 29, 2016, r...@open-mpi.org
<mailto:r...@open-mpi.org> <r...@open-mpi.org
<mailto:r...@open-mpi.org>> wrote:
Spawn definitely does not work with srun. I don’t recognize
the name of the file that segfaulted - what is “ptl.c”? Is
that in your manager program?
On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com
<javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>>
wrote:
Hi,
I do not expect spawn can work with direct launch (e.g. srun)
Do you have PSM (e.g. Infinipath) hardware ? That could be
linked to the failure
Can you please try
mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1
--hostfile my_hosts ./manager 1
and see if it help ?
Note if you have the possibility, I suggest you first try
that without slurm, and then within a slurm job
Cheers,
Gilles
On Thursday, September 29, 2016, juraj2...@gmail.com
<javascript:_e(%7B%7D,'cvml','juraj2...@gmail.com');>
<juraj2...@gmail.com
<javascript:_e(%7B%7D,'cvml','juraj2...@gmail.com');>> wrote:
Hello,
I am using MPI_Comm_spawn to dynamically create new
processes from single manager process. Everything
works fine when all the processes are running on the
same node. But imposing restriction to run only a
single process per node does not work. Below are the
errors produced during multinode interactive session
and multinode sbatch job.
The system I am using is: Linux version
3.10.0-229.el7.x86_64 (buil...@kbuilder.dev.centos.org
<mailto:buil...@kbuilder.dev.centos.org>) (gcc version
4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) )
I am using Open MPI 2.0.1
Slurm is version 15.08.9
What is preventing my jobs to spawn on multiple nodes?
Does slurm requires some additional configuration to
allow it? Is it issue on the MPI side, does it need to
be compiled with some special flag (I have compiled it
with --enable-mpi-fortran=all --with-pmi)?
The code I am launching is here:
https://github.com/goghino/dynamicMPI
Manager tries to launch one new process (./manager 1),
the error produced by requesting each process to be
located on different node (interactive session):
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr
== ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr
== ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code: (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code: (-6)
The same example as above via sbatch job submission:
$ cat job.sbatch
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1
$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for
job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn
processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes
in this communicator will now abort,
[icsnode39:9692] *** and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT
2016-09-26T16:48:20 ***
srun: error: icsnode39: task 0: Exited with exit code 50
Thank for any feedback!
Best regards,
Juraj
_______________________________________________
users mailing list
us...@lists.open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@lists.open-mpi.org');>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
us...@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel