Ralph,

spawn_master spawns 4 spawn_slave

(i attached


note master is fine with

mpirun -np 1 --slot-list 0:0-5,1:0-5 --host motomachi ./spawn_master

but v2.0.x and v2.x are not.


Cheers,


Gilles



On 1/12/2017 1:42 PM, r...@open-mpi.org wrote:
Looking at this note again: how many procs is spawn_master generating?

On Jan 11, 2017, at 7:39 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:

Sigh - yet another corner case. Lovely. Will take a poke at it later this week. Thx for tracking it down

On Jan 11, 2017, at 5:27 PM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

Ralph,


so it seems the root cause is a kind of incompatibility between the --host and the --slot-list options


on a single node with two six cores sockets,

this works :

mpirun -np 1 ./spawn_master
mpirun -np 1 --slot-list 0:0-5,1:0-5 ./spawn_master
mpirun -np 1 --host motomachi --oversubscribe ./spawn_master
mpirun -np 1 --slot-list 0:0-5,1:0-5 --host motomachi:12 ./spawn_master


this does not work

mpirun -np 1 --host motomachi ./spawn_master # not enough slots available, aborts with a user friendly error message mpirun -np 1 --slot-list 0:0-5,1:0-5 --host motomachi ./spawn_master # various errors sm_segment_attach() fails, a task crashes
and this ends up with the following error message

At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[15519,2],0]) is on host: motomachi
  Process 2 ([[15519,2],1]) is on host: unknown!
  BTLs attempted: self tcp

mpirun -np 1 --slot-list 0:0-5,1:0-5 --host motomachi:1 ./spawn_master # same error as above mpirun -np 1 --slot-list 0:0-5,1:0-5 --host motomachi:2 ./spawn_master # same error as above


for the record, the following command surprisingly works

mpirun -np 1 --slot-list 0:0-5,1:0-5 --host motomachi:3 --mca btl tcp,self ./spawn_master



bottom line, my guess is that when the user specifies the --slot-list and the --host options *and* there are no default slot numbers to hosts, we should default to using the number
of slots from the slot list.
(e.g. in this case, defaults to --host motomachi:12 instead of (i guess) --host motomachi:1)


/* fwiw, i made

https://github.com/open-mpi/ompi/pull/2715

https://github.com/open-mpi/ompi/pull/2715

but these are not the root cause */


Cheers,


Gilles



-------- Forwarded Message --------
Subject: Re: [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux
Date:   Wed, 11 Jan 2017 20:39:02 +0900
From:   Gilles Gouaillardet<gilles.gouaillar...@gmail.com>
Reply-To:       Open MPI Users<us...@lists.open-mpi.org>
To:     Open MPI Users<us...@lists.open-mpi.org>



Siegmar,

Your slot list is correct.
An invalid slot list for your node would be 0:1-7,1:0-7

/* and since the test requires only 5 tasks, that could even work with such an invalid list. My vm is single socket with 4 cores, so a 0:0-4 slot list results in an unfriendly pmix error */

Bottom line, your test is correct, and there is a bug in v2.0.x that I will investigate from tomorrow

Cheers,

Gilles

On Wednesday, January 11, 2017, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:

    Hi Gilles,

    thank you very much for your help. What does incorrect slot list
    mean? My machine has two 6-core processors so that I specified
    "--slot-list 0:0-5,1:0-5". Does incorrect mean that it isn't
    allowed to specify more slots than available, to specify fewer
    slots than available, or to specify more slots than needed for
    the processes?


    Kind regards

    Siegmar

    Am 11.01.2017 um 10:04 schrieb Gilles Gouaillardet:

        Siegmar,

        I was able to reproduce the issue on my vm
        (No need for a real heterogeneous cluster here)

        I will keep digging tomorrow.
        Note that if you specify an incorrect slot list,
        MPI_Comm_spawn fails with a very unfriendly error message.
        Right now, the 4th spawn'ed task crashes, so this is a
        different issue

        Cheers,

        Gilles

        r...@open-mpi.org wrote:
        I think there is some relevant discussion
        here:https://github.com/open-mpi/ompi/issues/1569
        <https://github.com/open-mpi/ompi/issues/1569>

        It looks like Gilles had (at least at one point) a fix for
        master when enable-heterogeneous, but I don’t know if that
        was committed.

            On Jan 9, 2017, at 8:23 AM, Howard Pritchard
            <hpprit...@gmail.com<mailto:hpprit...@gmail.com>> wrote:

            HI Siegmar,

            You have some config parameters I wasn't trying that may
            have some impact.
            I'll give a try with these parameters.

            This should be enough info for now,

            Thanks,

            Howard


            2017-01-09 0:59 GMT-07:00 Siegmar Gross
            
<siegmar.gr...@informatik.hs-fulda.de<mailto:siegmar.gr...@informatik.hs-fulda.de>>:

            Hi Howard,

            I use the following commands to build and install the
            package.
            ${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64"
            for my
            Linux machine.

            mkdir openmpi-2.0.2rc3-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
            cd openmpi-2.0.2rc3-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc

            ../openmpi-2.0.2rc3/configure \
            --prefix=/usr/local/openmpi-2.0.2_64_cc \
            --libdir=/usr/local/openmpi-2.0.2_64_cc/lib64 \
            --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
            --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
            JAVA_HOME=/usr/local/jdk1.8.0_66 \
            LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc"
            CXX="CC" FC="f95" \
            CFLAGS="-m64 -mt" CXXFLAGS="-m64" FCFLAGS="-m64" \
            CPP="cpp" CXXCPP="cpp" \
            --enable-mpi-cxx \
            --enable-mpi-cxx-bindings \
            --enable-cxx-exceptions \
            --enable-mpi-java \
            --enable-heterogeneous \
            --enable-mpi-thread-multiple \
            --with-hwloc=internal \
            --without-verbs \
            --with-wrapper-cflags="-m64 -mt" \
            --with-wrapper-cxxflags="-m64" \
            --with-wrapper-fcflags="-m64" \
            --with-wrapper-ldflags="-mt" \
            --enable-debug \
            |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

            make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
            rm -r /usr/local/openmpi-2.0.2_64_cc.old
            mv /usr/local/openmpi-2.0.2_64_cc
            /usr/local/openmpi-2.0.2_64_cc.old
            make install |& tee
            log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
            make check |& tee
            log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc


            I get a different error if I run the program with gdb.

            loki spawn 118 gdb
            /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec
            GNU gdb (GDB; SUSE Linux Enterprise 12) 7.11.1
            Copyright (C) 2016 Free Software Foundation, Inc.
            License GPLv3+: GNU GPL version 3 or later
            <http://gnu.org/licenses/gpl.html
            <http://gnu.org/licenses/gpl.html><http://gnu.org/licenses/gpl.html
            <http://gnu.org/licenses/gpl.html>>>
            This is free software: you are free to change and
            redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
            and "show warranty" for details.
            This GDB was configured as "x86_64-suse-linux".
            Type "show configuration" for configuration details.
            For bug reporting instructions, please see:
            <http://bugs.opensuse.org/>.
            Find the GDB manual and other documentation resources
            online at:
            <http://www.gnu.org/software/gdb/documentation/
            
<http://www.gnu.org/software/gdb/documentation/><http://www.gnu.org/software/gdb/documentation/
            <http://www.gnu.org/software/gdb/documentation/>>>.
            For help, type "help".
            Type "apropos word" to search for commands related to
            "word"...
            Reading symbols from
            /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec...done.
            (gdb) r -np 1 --host loki --slot-list 0:0-5,1:0-5
            spawn_master
            Starting program:
            /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec -np 1 --host
            loki --slot-list 0:0-5,1:0-5 spawn_master
            Missing separate debuginfos, use: zypper install
            glibc-debuginfo-2.24-2.3.x86_64
            [Thread debugging using libthread_db enabled]
            Using host libthread_db library "/lib64/libthread_db.so.1".
            [New Thread 0x7ffff3b97700 (LWP 13582)]
            [New Thread 0x7ffff18a4700 (LWP 13583)]
            [New Thread 0x7ffff10a3700 (LWP 13584)]
            [New Thread 0x7fffebbba700 (LWP 13585)]
            Detaching after fork from child process 13586.

            Parent process 0 running on loki
            I create 4 slave processes

            Detaching after fork from child process 13589.
            Detaching after fork from child process 13590.
            Detaching after fork from child process 13591.
            [loki:13586] OPAL ERROR: Timeout in file
            ../../../../openmpi-2.0.2rc3/opal/mca/pmix/base/pmix_base_fns.c
            at line 193
            [loki:13586] *** An error occurred in MPI_Comm_spawn
            [loki:13586] *** reported by process [2873294849,0]
            [loki:13586] *** on communicator MPI_COMM_WORLD
            [loki:13586] *** MPI_ERR_UNKNOWN: unknown error
            [loki:13586] *** MPI_ERRORS_ARE_FATAL (processes in this
            communicator will now abort,
            [loki:13586] ***    and potentially your MPI job)
            [Thread 0x7fffebbba700 (LWP 13585) exited]
            [Thread 0x7ffff10a3700 (LWP 13584) exited]
            [Thread 0x7ffff18a4700 (LWP 13583) exited]
            [Thread 0x7ffff3b97700 (LWP 13582) exited]
            [Inferior 1 (process 13567) exited with code 016]
            Missing separate debuginfos, use: zypper install
            libpciaccess0-debuginfo-0.13.2-5.1.x86_64
            libudev1-debuginfo-210-116.3.3.x86_64
            (gdb) bt
            No stack.
            (gdb)

            Do you need anything else?


            Kind regards

            Siegmar

            Am 08.01.2017 um 17:02 schrieb Howard Pritchard:

            HI Siegmar,

            Could you post the configury options you use when
            building the 2.0.2rc3?
            Maybe that will help in trying to reproduce the segfault
            you are observing.

            Howard


            2017-01-07 2:30 GMT-07:00 Siegmar Gross
            
<siegmar.gr...@informatik.hs-fulda.de<mailto:siegmar.gr...@informatik.hs-fulda.de>
            
<mailto:siegmar.gr...@informatik.hs-fulda.de<mailto:siegmar.gr...@informatik.hs-fulda.de>>>:

            Hi,

            I have installed openmpi-2.0.2rc3 on my "SUSE Linux
            Enterprise
            Server 12 (x86_64)" with Sun C 5.14 and gcc-6.3.0.
            Unfortunately,
            I still get the same error that I reported for rc2.

            I would be grateful, if somebody can fix the problem before
            releasing the final version. Thank you very much for any
            help
            in advance.


            Kind regards

            Siegmar
            _______________________________________________
            users mailing list
            us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>
            <mailto:us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users><https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>>
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users><https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>>>




            _______________________________________________
            users mailing list
            us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users><https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>>

            _______________________________________________
            users mailing list
            us...@lists.open-mpi.org<mailto:us...@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users><https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>>


            _______________________________________________
            users mailing list
            us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/users
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>




        _______________________________________________
        users mailing list
        us...@lists.open-mpi.org
        https://rfd.newmexicoconsortium.org/mailman/listinfo/users
        <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

    _______________________________________________
    users mailing list
    us...@lists.open-mpi.org
    https://rfd.newmexicoconsortium.org/mailman/listinfo/users
    <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

<Attached Message Part.txt>_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel



_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

/* The program demonstrates how to spawn some dynamic MPI processes.
 * This program is the slave part for the programs "spawn_master" and
 * "spawn_multiple_master".
 *
 * A process or a group of processes can create another group of
 * processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
 * In general it is best (better performance) to start all processes
 * statically with "mpiexec" via the command line. If you want to use
 * dynamic processes you will normally have one master process which
 * starts a lot of slave processes. In some cases it may be useful to
 * enlarge a group of processes, e.g., if the MPI universe provides
 * more virtual cpu's than the current number of processes and the
 * program may benefit from additional processes. You will use
 * "MPI_Comm_spwan_multiple ()" if you must start different
 * programs or if you want to start the same program with different
 * parameters.
 *
 * There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
 * instead of calling "MPI_Comm_spawn ()" multiple times. If you
 * spawn new (child) processes they start up like any MPI application,
 * i.e., they call "MPI_Init ()" and can use the communicator
 * MPI_COMM_WORLD afterwards. This communicator contains only the
 * child processes which have been created with the same call of
 * "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
 * of the parent process or processes created in other calls of
 * "MPI_Comm_spawn ()". The natural communication mechanism between
 * the groups of parent and child processes is via an
 * inter-communicator which will be returned from the above
 * MPI functions to spawn new processes. The local group of the
 * inter-communicator contains the parent processes and the remote
 * group contains the child processes. The child processes can get
 * the same inter-communicator calling "MPI_Comm_get_parent ()".
 * Now it is obvious that calling "MPI_Comm_spawn ()" multiple
 * times will create many sets of children with different
 * communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
 * creates child processes with a single MPI_COMM_WORLD. Furthermore
 * spawning several processes in one call may be faster than spawning
 * them sequentially and perhaps even the communication between
 * processes spawned at the same time may be faster than communication
 * between sequentially spawned processes.
 *
 * For collective operations it is sometimes easier if all processes
 * belong to the same intra-communicator. You can use the function
 * "MPI_Intercomm_merge ()" to merge the local and remote group of
 * an inter-communicator into an intra-communicator.
 * 
 *
 * Compiling:
 *   Store executable(s) into local directory.
 *     mpicc -o <program name> <source code file name>
 *
 *   Store executable(s) into predefined directories.
 *     make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 *     make_compile
 *
 * Running:
 *   LAM-MPI:
 *     mpiexec -boot -np <number of processes> <program name>
 *     or
 *     mpiexec -boot \
 *       -host <hostname> -np <number of processes> <program name> : \
 *       -host <hostname> -np <number of processes> <program name>
 *     or
 *     mpiexec -boot [-v] -configfile <application file>
 *     or
 *     lamboot [-v] [<host file>]
 *       mpiexec -np <number of processes> <program name>
 *       or
 *       mpiexec [-v] -configfile <application file>
 *     lamhalt
 *
 *   OpenMPI:
 *     "host1", "host2", and so on can all have the same name,
 *     if you want to start a virtual computer with some virtual
 *     cpu's on the local host. The name "localhost" is allowed
 *     as well.
 *
 *     mpiexec -np <number of processes> <program name>
 *     or
 *     mpiexec --host <host1,host2,...> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -hostfile <hostfile name> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -app <application file>
 *
 * Cleaning:
 *   local computer:
 *     rm <program name>
 *     or
 *     make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 *     make_clean_all
 *
 *
 * File: spawn_slave.c                  Author: S. Gross
 * Date: 30.08.2012
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"


int main (int argc, char *argv[])
{
  int  ntasks_world,                    /* # of tasks in MPI_COMM_WORLD */
       mytid,                           /* my task id                   */
       namelen,                         /* length of processor name     */
       i;                               /* loop variable                */
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the next statement every process executing this code will
   * print one line on the display. It may happen that the lines will
   * get mixed up because the display is a critical section. In general
   * only one process (mostly the process with rank 0) will print on
   * the display and all other processes will send their messages to
   * this process. Nevertheless for debugging purposes (or to
   * demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  fprintf (stdout, "Slave process %d of %d running on %s\n",
           mytid, ntasks_world, processor_name);
  fflush (stdout);
  MPI_Barrier (MPI_COMM_WORLD);         /* wait for all other processes */
  for (i = 0; i < argc; ++i)
  {
    printf ("%s %d: argv[%d]: %s\n", argv[0], mytid, i, argv[i]);
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
/* The program demonstrates how to spawn some dynamic MPI processes.
 * This version uses one master process which creates some slave
 * processes.
 *
 * A process or a group of processes can create another group of
 * processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
 * In general it is best (better performance) to start all processes
 * statically with "mpiexec" via the command line. If you want to use
 * dynamic processes you will normally have one master process which
 * starts a lot of slave processes. In some cases it may be useful to
 * enlarge a group of processes, e.g., if the MPI universe provides
 * more virtual cpu's than the current number of processes and the
 * program may benefit from additional processes. You will use
 * "MPI_Comm_spwan_multiple ()" if you must start different
 * programs or if you want to start the same program with different
 * parameters.
 *
 * There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
 * instead of calling "MPI_Comm_spawn ()" multiple times. If you
 * spawn new (child) processes they start up like any MPI application,
 * i.e., they call "MPI_Init ()" and can use the communicator
 * MPI_COMM_WORLD afterwards. This communicator contains only the
 * child processes which have been created with the same call of
 * "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
 * of the parent process or processes created in other calls of
 * "MPI_Comm_spawn ()". The natural communication mechanism between
 * the groups of parent and child processes is via an
 * inter-communicator which will be returned from the above
 * MPI functions to spawn new processes. The local group of the
 * inter-communicator contains the parent processes and the remote
 * group contains the child processes. The child processes can get
 * the same inter-communicator calling "MPI_Comm_get_parent ()".
 * Now it is obvious that calling "MPI_Comm_spawn ()" multiple
 * times will create many sets of children with different
 * communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
 * creates child processes with a single MPI_COMM_WORLD. Furthermore
 * spawning several processes in one call may be faster than spawning
 * them sequentially and perhaps even the communication between
 * processes spawned at the same time may be faster than communication
 * between sequentially spawned processes.
 *
 * For collective operations it is sometimes easier if all processes
 * belong to the same intra-communicator. You can use the function
 * "MPI_Intercomm_merge ()" to merge the local and remote group of
 * an inter-communicator into an intra-communicator.
 * 
 *
 * Compiling:
 *   Store executable(s) into local directory.
 *     mpicc -o <program name> <source code file name>
 *
 *   Store executable(s) into predefined directories.
 *     make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 *     make_compile
 *
 * Running:
 *   LAM-MPI:
 *     mpiexec -boot -np <number of processes> <program name>
 *     or
 *     mpiexec -boot \
 *       -host <hostname> -np <number of processes> <program name> : \
 *       -host <hostname> -np <number of processes> <program name>
 *     or
 *     mpiexec -boot [-v] -configfile <application file>
 *     or
 *     lamboot [-v] [<host file>]
 *       mpiexec -np <number of processes> <program name>
 *       or
 *       mpiexec [-v] -configfile <application file>
 *     lamhalt
 *
 *   OpenMPI:
 *     "host1", "host2", and so on can all have the same name,
 *     if you want to start a virtual computer with some virtual
 *     cpu's on the local host. The name "localhost" is allowed
 *     as well.
 *
 *     mpiexec -np <number of processes> <program name>
 *     or
 *     mpiexec --host <host1,host2,...> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -hostfile <hostfile name> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -app <application file>
 *
 * Cleaning:
 *   local computer:
 *     rm <program name>
 *     or
 *     make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 *     make_clean_all
 *
 *
 * File: spawn_master.c                 Author: S. Gross
 * Date: 28.09.2013
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_SLAVES      4               /* create NUM_SLAVES processes  */
#define SLAVE_PROG      "spawn_slave"   /* slave program name           */


int main (int argc, char *argv[])
{
  MPI_Comm COMM_CHILD_PROCESSES;        /* inter-communicator           */
  int      ntasks_world,                /* # of tasks in MPI_COMM_WORLD */
           ntasks_local,                /* COMM_CHILD_PROCESSES local   */
           ntasks_remote,               /* COMM_CHILD_PROCESSES remote  */
           mytid,                       /* my task id                   */
           namelen;                     /* length of processor name     */
  char     processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  /* check that only the master process is running in MPI_COMM_WORLD.   */
  if (ntasks_world > 1)
  {
    if (mytid == 0)
    {
      fprintf (stderr, "\n\nError: Too many processes (only one "
               "process allowed).\n"
               "Usage:\n"
               "  mpiexec %s\n\n",
               argv[0]);
    }
    MPI_Finalize ();
    exit (EXIT_SUCCESS);
  }
  MPI_Get_processor_name (processor_name, &namelen);
  printf ("\nParent process %d running on %s\n"
          "  I create %d slave processes\n\n",
          mytid,  processor_name, NUM_SLAVES);
  MPI_Comm_spawn (SLAVE_PROG, MPI_ARGV_NULL, NUM_SLAVES,
                  MPI_INFO_NULL, 0, MPI_COMM_WORLD,
                  &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  MPI_Comm_size (COMM_CHILD_PROCESSES, &ntasks_local);
  MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
  printf ("Parent process %d: "
          "tasks in MPI_COMM_WORLD:                    %d\n"
          "                  tasks in COMM_CHILD_PROCESSES local "
          "group:  %d\n"
          "                  tasks in COMM_CHILD_PROCESSES remote "
          "group: %d\n\n",
          mytid, ntasks_world, ntasks_local, ntasks_remote);
  MPI_Comm_free (&COMM_CHILD_PROCESSES);
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to