Hey Gilles,

Quick answer on the first part until I read a little more about num_max_procs :O
Being the third parameter of setenv a 0 means:  do not override if found in the 
env.  So the workaround does work today. Moreover, I would like to know if 
there is a place in some OMPI wiki to document this behavior.

Thanks,

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gilles 
Gouaillardet
Sent: Thursday, September 29, 2016 6:14 PM
To: Open MPI Developers <devel@lists.open-mpi.org>
Subject: [OMPI devel] mtl/psm2 and $PSM2_DEVICES


This is a follow-up of 
https://mail-archive.com/users@lists.open-mpi.org/msg30055.html



Thanks Matias for the lengthy explanation.



currently, PSM2_DEVICES is overwritten, so i do not think setting it before 
invoking mpirun will help



also, in this specific case

- the user is running within a SLURM allocation with 2 nodes

- the user specified a host file with 2 distinct nodes



my first impression is that mtl/psm2 could/should handle this (well only one 
condition has to be met) properly and *not* set

export PSM2_DEVICES="self,shm"

the patch below
- does not overwrite PSM2_DEVICES
- does not set PSM2_DEVICES when num_max_procs > num_total_procs
this is suboptimal, but i could not find a way to get the number of orted.
iirc, MPI_Comm_spawn can have an orted dynamically spawned by passing a host in 
the MPI_Info.
if this host is not part of the hostfile (nor RM allocation ?), then 
PSM2_DEVICES must be set manually by the user


Ralph,

is there a way to get the number of orted ?
- if i mpirun -np 1 --host n0,n1 ... orte_process_info.num_nodes is 1 (i wish i 
could get 2)
- if running in singleton mode, orte_process_info.num_max_procs is 0 (is this a 
bug or a feature ?)

Cheers,

Gilles


diff --git a/ompi/mca/mtl/psm2/mtl_psm2_component.c 
b/ompi/mca/mtl/psm2/mtl_psm2_component.c
index 26bccd2..52b906b 100644
--- a/ompi/mca/mtl/psm2/mtl_psm2_component.c
+++ b/ompi/mca/mtl/psm2/mtl_psm2_component.c
@@ -14,6 +14,8 @@
  * Copyright (c) 2012-2015 Los Alamos National Security, LLC.
  *                         All rights reserved.
  * Copyright (c) 2013-2016 Intel, Inc. All rights reserved
+ * Copyright (c) 2016      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -170,6 +172,13 @@ get_num_total_procs(int *out_ntp)
 }

 static int
+get_num_max_procs(int *out_nmp)
+{
+  *out_nmp = (int)ompi_process_info.max_procs;
+  return OMPI_SUCCESS;
+}
+
+static int
 get_num_local_procs(int *out_nlp)
 {
     /* num_local_peers does not include us in
@@ -201,7 +210,7 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
     int        verno_major = PSM2_VERNO_MAJOR;
     int verno_minor = PSM2_VERNO_MINOR;
     int local_rank = -1, num_local_procs = 0;
-    int num_total_procs = 0;
+    int num_total_procs = 0, num_max_procs = 0;

     /* Compute the total number of processes on this host and our local rank
      * on that node. We need to provide PSM2 with these values so it can
@@ -221,6 +230,11 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
                     "Cannot continue.\n");
         return NULL;
     }
+    if (OMPI_SUCCESS != get_num_max_procs(&num_max_procs)) {
+        opal_output(0, "Cannot determine max number of processes. "
+                    "Cannot continue.\n");
+        return NULL;
+    }

     err = psm2_error_register_handler(NULL /* no ep */,
                                     PSM2_ERRHANDLER_NOP);
@@ -230,8 +244,10 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
        return NULL;
     }

-    if (num_local_procs == num_total_procs) {
-      setenv("PSM2_DEVICES", "self,shm", 0);
+    if ((num_local_procs == num_total_procs) && (num_max_procs <= 
num_total_procs)) {
+        if (NULL == getenv("PSM2_DEVICES")) {
+            setenv("PSM2_DEVICES", "self,shm", 0);
+        }
     }

     err = psm2_init(&verno_major, &verno_minor);






On 9/30/2016 12:38 AM, Cabral, Matias A wrote:
Hi Giles et.al.,

You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process 
support was/is not working in OMPI when using PSM2 because of an issue related 
to the transport keys. This was fixed in PR #1602 
(https://github.com/open-mpi/ompi/pull/1602) and should be included in v2.0.2. 
HOWEVER, this not the error Juraj is seeing. The root of the assertion is 
because the PSM/PSM2 MTLs will check for where the "original" process are 
running and, if detects all are local to the node, it will ONLY initialize the 
shared memory device (variable PSM2_DEVICES="self,shm" ). This is to avoid 
"reserving" HW resources in the HFI card that wouldn't be used unless you later 
on spawn ranks in other nodes.  Therefore, to allow dynamic process to be 
spawned on other nodes you need to tell PSM2 to instruct the HW to initialize 
all the de devices by making the environment variable 
PSM2_DEVICES="self,shm,hfi" available before running the job.
Note that setting PSM2_DEVICES (*) will solve the below assertion, you will 
most likely still see the transport key issue if PR1602 if is not included.

Thanks,

_MAC

(*)
PSM2_DEVICES  -> Omni Path
                PSM_DEVICES  -> TrueScale

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org<mailto:r...@open-mpi.org>
Sent: Thursday, September 29, 2016 7:12 AM
To: Open MPI Users <us...@lists.open-mpi.org><mailto:us...@lists.open-mpi.org>
Subject: Re: [OMPI users] MPI_Comm_spawn

Ah, that may be why it wouldn't show up in the OMPI code base itself. If that 
is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It 
is fixed in the upcoming 2.0.2

On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote:

Ralph,

My guess is that ptl.c comes from PSM lib ...

Cheers,

Gilles

On Thursday, September 29, 2016, r...@open-mpi.org<mailto:r...@open-mpi.org> 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
Spawn definitely does not work with srun. I don't recognize the name of the 
file that segfaulted - what is "ptl.c"? Is that in your manager program?


On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com<javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>>
 wrote:

Hi,

I do not expect spawn can work with direct launch (e.g. srun)

Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the failure

Can you please try

mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts ./manager 1

and see if it help ?

Note if you have the possibility, I suggest you first try that without slurm, 
and then within a slurm job

Cheers,

Gilles

On Thursday, September 29, 2016, 
juraj2...@gmail.com<javascript:_e(%7B%7D,'cvml','juraj2...@gmail.com');> 
<juraj2...@gmail.com<javascript:_e(%7B%7D,'cvml','juraj2...@gmail.com');>> 
wrote:
Hello,

I am using MPI_Comm_spawn to dynamically create new processes from single 
manager process. Everything works fine when all the processes are running on 
the same node. But imposing restriction to run only a single process per node 
does not work. Below are the errors produced during multinode interactive 
session and multinode sbatch job.

The system I am using is: Linux version 3.10.0-229.el7.x86_64 
(buil...@kbuilder.dev.centos.org<mailto:buil...@kbuilder.dev.centos.org>) (gcc 
version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) )
I am using Open MPI 2.0.1
Slurm is version 15.08.9

What is preventing my jobs to spawn on multiple nodes? Does slurm requires some 
additional configuration to allow it? Is it issue on the MPI side, does it need 
to be compiled with some special flag (I have compiled it with 
--enable-mpi-fortran=all --with-pmi)?

The code I am launching is here: https://github.com/goghino/dynamicMPI

Manager tries to launch one new process (./manager 1), the error produced by 
requesting each process to be located on different node (interactive session):
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code:  (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code:  (-6)

The same example as above via sbatch job submission:
$ cat job.sbatch
#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1

module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1

$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[icsnode39:9692] ***    and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20 ***
srun: error: icsnode39: task 0: Exited with exit code 50

Thank for any feedback!

Best regards,
Juraj
_______________________________________________
users mailing list
us...@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','us...@lists.open-mpi.org');>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users





_______________________________________________

users mailing list

us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>

https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to