[OMPI users] MPI_Comm_spawn question

2017-01-31 Thread elistratovaa
Hi,

I am trying to write trivial master-slave program. Master simply creates
slaves, sends them a string, they print it out and exit. Everything works
just fine, however, when I add a delay (more than 2 sec) before calling
MPI_Init on slave, MPI fails with MPI_ERR_SPAWN. I am pretty sure that
MPI_Comm_spawn has some kind of timeout on waiting for slaves to call
MPI_Init, and if they fail to respond in time, it returns an error.

I believe there is a way to change this behaviour, but I wasn't able to
find any suggestions/ideas in the internet.
I would appreciate if someone could help with this.

---
--- terminal command i use to run program:
mpirun -n 1 hello 2 2 // the first argument to "hello" is number of
slaves, the second is delay in seconds

--- Error message I get when delay is >=2 sec:
[host:2231] *** An error occurred in MPI_Comm_spawn
[host:2231] *** reported by process [3453419521,0]
[host:2231] *** on communicator MPI_COMM_SELF
[host:2231] *** MPI_ERR_SPAWN: could not spawn processes
[host:2231] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
[host:2231] ***and potentially your MPI job)

--- The program itself:
#include "stdlib.h"
#include "stdio.h"
#include "mpi.h"
#include "unistd.h"

MPI_Comm slave_comm;
MPI_Comm new_world;
#define MESSAGE_SIZE 40

void slave() {
printf("Slave initialized; ");
MPI_Comm_get_parent(&slave_comm);
MPI_Intercomm_merge(slave_comm, 1, &new_world);

int slave_rank;
MPI_Comm_rank(new_world, &slave_rank);

char message[MESSAGE_SIZE];
MPI_Bcast(message, MESSAGE_SIZE, MPI_CHAR, 0, new_world);

printf("Slave %d received message from master: %s\n", slave_rank, 
message);
}

void master(int slave_count, char* executable, char* delay) {
char* slave_argv[] = { delay, NULL };
MPI_Comm_spawn( executable,
slave_argv,
slave_count,
MPI_INFO_NULL,
0,
MPI_COMM_SELF,
&slave_comm,
MPI_ERRCODES_IGNORE);
MPI_Intercomm_merge(slave_comm, 0, &new_world);
char* helloWorld = "Hello New World!\0";
MPI_Bcast(helloWorld, MESSAGE_SIZE, MPI_CHAR, 0, new_world);
printf("Processes spawned!\n");
}

int main(int argc, char* argv[]) {
if (argc > 2) {
MPI_Init(&argc, &argv);
master(atoi(argv[1]), argv[0], argv[2]);
} else {
sleep(atoi(argv[1])); /// delay
MPI_Init(&argc, &argv);
slave();
}
MPI_Comm_free(&new_world);
MPI_Comm_free(&slave_comm);
MPI_Finalize();
}


Thank you,

Andrew Elistratov


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] mpi_comm_spawn question

2014-07-03 Thread Milan Hodoscek
Hi,

I am trying to run the following setup in fortran without much
success:

I have an MPI program, that uses mpi_comm_spawn which spawns some
interface program that communicates with the one that spawned it. This
spawned program then prepares some data and uses call system()
statement in fortran. Now if the program that is called from system is
not mpi program itself everything is running OK. But I want to run the
program with something like mpirun -n X ... and then this is a no go.

Different versions of open mpi give different messages before they
either die or hang. I googled all the messages but all I get is just
links to some openmpi sources, so I would appreciate if someone can
help me explain how to run above setup. Given so many MCA options I
hope there is one which can run the above setup ??

The message for 1.6 is the following:
... routed:binomial: connection to lifeline lost (+ PIDs and port numbers)

The message for 1.8.1 is:
... FORKING HNP: orted --hnp --set-sid --report-uri 18 --singleton-died-pipe 19 
-mca state_novm_select 1 -mca ess_base_jobid 3378249728


If this is not trivial to solve problem I can provide a simple test
programs (we need 3) that show all of this.

Thanks,


Milan Hodoscek  
--
National Institute of Chemistry  tel:+386-1-476-0278
Hajdrihova 19fax:+386-1-476-0300
SI-1000 Ljubljanae-mail: mi...@cmm.ki.si  
Slovenia web: http://a.cmm.ki.si


Re: [OMPI users] MPI_Comm_spawn question

2017-01-31 Thread r...@open-mpi.org
What version of OMPI are you using?

> On Jan 31, 2017, at 7:33 AM, elistrato...@info.sgu.ru wrote:
> 
> Hi,
> 
> I am trying to write trivial master-slave program. Master simply creates
> slaves, sends them a string, they print it out and exit. Everything works
> just fine, however, when I add a delay (more than 2 sec) before calling
> MPI_Init on slave, MPI fails with MPI_ERR_SPAWN. I am pretty sure that
> MPI_Comm_spawn has some kind of timeout on waiting for slaves to call
> MPI_Init, and if they fail to respond in time, it returns an error.
> 
> I believe there is a way to change this behaviour, but I wasn't able to
> find any suggestions/ideas in the internet.
> I would appreciate if someone could help with this.
> 
> ---
> --- terminal command i use to run program:
> mpirun -n 1 hello 2 2 // the first argument to "hello" is number of
> slaves, the second is delay in seconds
> 
> --- Error message I get when delay is >=2 sec:
> [host:2231] *** An error occurred in MPI_Comm_spawn
> [host:2231] *** reported by process [3453419521,0]
> [host:2231] *** on communicator MPI_COMM_SELF
> [host:2231] *** MPI_ERR_SPAWN: could not spawn processes
> [host:2231] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
> now abort,
> [host:2231] ***and potentially your MPI job)
> 
> --- The program itself:
> #include "stdlib.h"
> #include "stdio.h"
> #include "mpi.h"
> #include "unistd.h"
> 
> MPI_Comm slave_comm;
> MPI_Comm new_world;
> #define MESSAGE_SIZE 40
> 
> void slave() {
>   printf("Slave initialized; ");
>   MPI_Comm_get_parent(&slave_comm);
>   MPI_Intercomm_merge(slave_comm, 1, &new_world);
> 
>   int slave_rank;
>   MPI_Comm_rank(new_world, &slave_rank);
> 
>   char message[MESSAGE_SIZE];
>   MPI_Bcast(message, MESSAGE_SIZE, MPI_CHAR, 0, new_world);
> 
>   printf("Slave %d received message from master: %s\n", slave_rank, 
> message);
> }
> 
> void master(int slave_count, char* executable, char* delay) {
>   char* slave_argv[] = { delay, NULL };
>   MPI_Comm_spawn( executable,
>   slave_argv,
>   slave_count,
>   MPI_INFO_NULL,
>   0,
>   MPI_COMM_SELF,
>   &slave_comm,
>   MPI_ERRCODES_IGNORE);
>   MPI_Intercomm_merge(slave_comm, 0, &new_world);
>   char* helloWorld = "Hello New World!\0";
>   MPI_Bcast(helloWorld, MESSAGE_SIZE, MPI_CHAR, 0, new_world);
>   printf("Processes spawned!\n");
> }
> 
> int main(int argc, char* argv[]) {
>   if (argc > 2) {
>   MPI_Init(&argc, &argv);
>   master(atoi(argv[1]), argv[0], argv[2]);
>   } else {
>   sleep(atoi(argv[1])); /// delay
>   MPI_Init(&argc, &argv);
>   slave();
>   }
>   MPI_Comm_free(&new_world);
>   MPI_Comm_free(&slave_comm);
>   MPI_Finalize();
> }
> 
> 
> Thank you,
> 
> Andrew Elistratov
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI_Comm_spawn question

2017-02-01 Thread elistratovaa
I am using Open MPI version 2.0.1.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI_Comm_spawn question

2017-02-03 Thread r...@open-mpi.org
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one 
of those. Regardless, there is indeed a timeout mechanism in there. It was 
added because people would execute a comm_spawn, and then would hang and eat up 
their entire allocation time for nothing.

In v2.0.2, I see it is still hardwired at 60 seconds. I believe we eventually 
realized we needed to make that a variable, but it didn’t get into the 2.0.2 
release.


> On Feb 1, 2017, at 1:00 AM, elistrato...@info.sgu.ru wrote:
> 
> I am using Open MPI version 2.0.1.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_spawn question

2017-02-04 Thread Gilles Gouaillardet
Andrew,

the 2 seconds timeout is very likely a bug that was fixed, so i strongly
suggest you give a try to the latest 2.0.2 that was released earlier this
week.

Ralph is referring an other timeout which is hard coded (fwiw, the MPI
standard says nothing about timeout, so we hardcoded one to prevent jobs
from hanging forever) to 600 seconds in master, but is still 60 seconds in
the v2.0.x branch
IIRC, the hard coded timeout is in MPI_Comm_{accept,connect} and i do not
know if it is somehow involved in MPI_Comm_spawn.

Cheers,

Gilles

On Saturday, February 4, 2017, r...@open-mpi.org  wrote:

> We know v2.0.1 has problems with comm_spawn, and so you may be
> encountering one of those. Regardless, there is indeed a timeout mechanism
> in there. It was added because people would execute a comm_spawn, and then
> would hang and eat up their entire allocation time for nothing.
>
> In v2.0.2, I see it is still hardwired at 60 seconds. I believe we
> eventually realized we needed to make that a variable, but it didn’t get
> into the 2.0.2 release.
>
>
> > On Feb 1, 2017, at 1:00 AM, elistrato...@info.sgu.ru 
> wrote:
> >
> > I am using Open MPI version 2.0.1.
> > ___
> > users mailing list
> > users@lists.open-mpi.org 
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpi_comm_spawn question

2014-07-03 Thread Ralph Castain
Unfortunately, that has never been supported. The problem is that the embedded 
mpirun picks up all those MCA params that were provided to the original 
application process, and gets hopelessly confused. We have tried in the past to 
figure out a solution, but it has proved difficult to separate those params 
that were set during launch of the original child from ones you are trying to 
provide to the embedded mpirun.

So it remains an "unsupported" operation.


On Jul 3, 2014, at 7:34 AM, Milan Hodoscek  wrote:

> Hi,
> 
> I am trying to run the following setup in fortran without much
> success:
> 
> I have an MPI program, that uses mpi_comm_spawn which spawns some
> interface program that communicates with the one that spawned it. This
> spawned program then prepares some data and uses call system()
> statement in fortran. Now if the program that is called from system is
> not mpi program itself everything is running OK. But I want to run the
> program with something like mpirun -n X ... and then this is a no go.
> 
> Different versions of open mpi give different messages before they
> either die or hang. I googled all the messages but all I get is just
> links to some openmpi sources, so I would appreciate if someone can
> help me explain how to run above setup. Given so many MCA options I
> hope there is one which can run the above setup ??
> 
> The message for 1.6 is the following:
> ... routed:binomial: connection to lifeline lost (+ PIDs and port numbers)
> 
> The message for 1.8.1 is:
> ... FORKING HNP: orted --hnp --set-sid --report-uri 18 --singleton-died-pipe 
> 19 -mca state_novm_select 1 -mca ess_base_jobid 3378249728
> 
> 
> If this is not trivial to solve problem I can provide a simple test
> programs (we need 3) that show all of this.
> 
> Thanks,
> 
> 
> Milan Hodoscek  
> --
> National Institute of Chemistry  tel:+386-1-476-0278
> Hajdrihova 19fax:+386-1-476-0300
> SI-1000 Ljubljanae-mail: mi...@cmm.ki.si  
> Slovenia web: http://a.cmm.ki.si
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/07/24744.php



Re: [OMPI users] mpi_comm_spawn question

2014-07-03 Thread George Bosilca
Why are you using system() the second time ? As you want to spawn an MPI
application calling MPI_Call_spawn would make everything simpler.

George

On Jul 3, 2014 4:34 PM, "Milan Hodoscek"  wrote:
>
> Hi,
>
> I am trying to run the following setup in fortran without much
> success:
>
> I have an MPI program, that uses mpi_comm_spawn which spawns some
> interface program that communicates with the one that spawned it. This
> spawned program then prepares some data and uses call system()
> statement in fortran. Now if the program that is called from system is
> not mpi program itself everything is running OK. But I want to run the
> program with something like mpirun -n X ... and then this is a no go.
>
> Different versions of open mpi give different messages before they
> either die or hang. I googled all the messages but all I get is just
> links to some openmpi sources, so I would appreciate if someone can
> help me explain how to run above setup. Given so many MCA options I
> hope there is one which can run the above setup ??
>
> The message for 1.6 is the following:
> ... routed:binomial: connection to lifeline lost (+ PIDs and port numbers)
>
> The message for 1.8.1 is:
> ... FORKING HNP: orted --hnp --set-sid --report-uri 18
--singleton-died-pipe 19 -mca state_novm_select 1 -mca ess_base_jobid
3378249728
>
>
> If this is not trivial to solve problem I can provide a simple test
> programs (we need 3) that show all of this.
>
> Thanks,
>
>
> Milan Hodoscek
> --
> National Institute of Chemistry  tel:+386-1-476-0278
> Hajdrihova 19fax:+386-1-476-0300
> SI-1000 Ljubljanae-mail: mi...@cmm.ki.si
> Slovenia web: http://a.cmm.ki.si
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/07/24744.php


Re: [OMPI users] mpi_comm_spawn question

2014-07-03 Thread Milan Hodoscek
> "George" == George Bosilca  writes:

George> Why are you using system() the second time ? As you want
George> to spawn an MPI application calling MPI_Call_spawn would
George> make everything simpler.

Yes, this works! Very good trick... The system routine would be more
flexible, but for the method we are working now mpi_comm_spawn is also
OK.

Thanks -- Milan