[OMPI users] MPI_Comm_accept / MPI_comm_Connect for client/server OMPI pattern not working in 5.0.0.rc3

2022-03-27 Thread Luca Repetti via users
Hello,

I am trying to replicate a simple client/server MPI application using 
MPI_Comm_accept and MPI_Comm_connect . Before version 5.0.x, I used the 
ompi-server command to allow the communication between the two processes, but I 
don't see this command anymore in the new 5.0.x release. Without running the 
ompi-server, I cannot publish anymore the port on which the server accepts 
connection; a minimal example below.

Moreover, even if I communicate the server port to the client in other ways 
(such as printing on a file), the two processes hang; in previous versions, I 
would get an error asking to run the ompi-server and communicate its address to 
the environment.

server.c


#include 
#include 

int main(int argc, char **argv ) {
MPI_Comm client;
char port_name[MPI_MAX_PORT_NAME];
int size;
MPI_Info info;

MPI_Init( ,  );
MPI_Comm_size(MPI_COMM_WORLD, );


MPI_Open_port(MPI_INFO_NULL, port_name);
printf("Server available at %s\n", port_name);

MPI_Info_create();

MPI_Publish_name("name", info, port_name);

printf("Wait for client connection\n");
MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,   );
printf("Client connected\n");

MPI_Unpublish_name("name", MPI_INFO_NULL, port_name);
MPI_Comm_free(  );
MPI_Close_port(port_name);
MPI_Finalize();
return 0;
}

client.c


#include 
#include 


int main(int argc, char **argv ) {
MPI_Comm server;
char port_name[MPI_MAX_PORT_NAME];

MPI_Init( ,  );

printf("Looking for server\n");
MPI_Lookup_name( "name", MPI_INFO_NULL, port_name);
printf("server found at %s\n", port_name);

printf("Wait for server connection\n");
MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,   );
printf("Server connected\n");

MPI_Comm_disconnect(  );
MPI_Finalize();
return 0;
}

Error message due to the lack of a ompi-server where to publish the port name

[parallels-Parallels-Virtual-Platform:61301] 
mca_base_component_repository_open: unable to open mca_reachable_netlink: 
libopen-pal.so.40: cannot open shared object file: No such file or directory 
(ignored)
[parallels-Parallels-Virtual-Platform:61301] 
mca_base_component_repository_open: unable to open mca_btl_openib: 
libopen-pal.so.40: cannot open shared object file: No such file or directory 
(ignored)
Looking for server
[parallels-Parallels-Virtual-Platform:0] *** An error occurred in 
MPI_Lookup_name
[parallels-Parallels-Virtual-Platform:0] *** reported by process 
[611254273,0]
[parallels-Parallels-Virtual-Platform:0] *** on communicator MPI_COMM_SELF
[parallels-Parallels-Virtual-Platform:0] *** MPI_ERR_NAME: invalid name 
argument
[parallels-Parallels-Virtual-Platform:0] *** MPI_ERRORS_ARE_FATAL 
(processes in this communicator will now abort,
[parallels-Parallels-Virtual-Platform:0] ***and MPI will try to 
terminate your MPI job as well)



Thank you in advance for any pointer or documentation I could use; for 
additional context, I'd like to use 5.0.0rc3 since it's the last version with 
ULFM, and version 4.0.3 with ULFM is broken due to an issue on host recognition 
with ompi-server (related github issue: 
https://github.com/open-mpi/ompi/issues/9396 )
[https://opengraph.githubassets.com/b4f6a3b86e93ad2b498ae3fe86821328c172e85ed1c2f343e0fda6fc4391fb07/open-mpi/ompi/issues/9396]
client/server mechanism broken? · Issue #9396 · 
open-mpi/ompi
Thank you for taking the time to submit an issue! Background information What 
version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and 
hash, etc.) v3.1.4 but have tried v4.1.0 ...
github.com


Luca Repetti


Re: [OMPI users] MPI_Comm_accept()

2017-05-28 Thread Adam Sylvester
Thanks!  I've been working around this in the meantime but will look
forward to using it in 3.0.

On Sat, May 27, 2017 at 4:02 PM, r...@open-mpi.org  wrote:

> Hardly the hoped-for quick turnaround, but it has been fixed in master and
> will go into v3.0, which is planned for release in the near future
>
> On Mar 14, 2017, at 6:26 PM, Adam Sylvester  wrote:
>
> Excellent - I appreciate the quick turnaround.
>
> On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org 
> wrote:
>
>> I don’t see an issue right away, though I know it has been brought up
>> before. I hope to resolve it either this week or next - will reply to this
>> thread with the PR link when ready.
>>
>>
>> On Mar 13, 2017, at 6:16 PM, Adam Sylvester  wrote:
>>
>> Bummer - thanks for the update.  I will revert back to 1.10.x for now
>> then.  Should I file a bug report for this on GitHub or elsewhere?  Or if
>> there's an issue for this already open, can you point me to it so I can
>> keep track of when it's fixed?  Any best guess calendar-wise as to when you
>> expect this to be fixed?
>>
>> Thanks.
>>
>> On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org 
>> wrote:
>>
>>> You should consider it a bug for now - it won’t work in the 2.0 series,
>>> and I don’t think it will work in the upcoming 2.1.0 release. Probably will
>>> be fixed after that.
>>>
>>>
>>> On Mar 13, 2017, at 5:17 AM, Adam Sylvester  wrote:
>>>
>>> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
>>> expected (the port formatting looks really different):
>>>
>>> $ mpirun -np 1 ./server
>>> Port name is 1286733824.0;tcp://10.102.16.1
>>> 35:43074+1286733825.0;tcp://10.102.16.135::300
>>> Accepted!
>>>
>>> $ mpirun -np 1 ./client "1286733824.0;tcp://10.102.16.
>>> 135:43074+1286733825.0;tcp://10.102.16.135::300"
>>> Trying with '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://1
>>> 0.102.16.135::300'
>>> Connected!
>>>
>>> I've found some other posts of users asking about similar things
>>> regarding the 2.x release - is this a bug?
>>>
>>> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester 
>>> wrote:
>>>
 I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port()
 / MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have
 two processes running on two machines that don't initially know about each
 other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
 think I may need to use ompi-server to accomplish what I want but for now
 I'm trying to test this out running two processes on the same machine with
 some toy programs.

 server.cpp creates the port, prints it, and waits for a client to
 accept using it:

 #include 
 #include 

 int main(int argc, char** argv)
 {
 MPI_Init(NULL, NULL);

 char myport[MPI_MAX_PORT_NAME];
 MPI_Comm intercomm;

 MPI_Open_port(MPI_INFO_NULL, myport);
 std::cout << "Port name is " << myport << std::endl;

 MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF,
 );

 std::cout << "Accepted!" << std::endl;

 MPI_Finalize();
 return 0;
 }

 client.cpp takes in this port on the command line and tries to connect
 to it:

 #include 
 #include 

 int main(int argc, char** argv)
 {
 MPI_Init(NULL, NULL);

 MPI_Comm intercomm;

 const std::string name(argv[1]);
 std::cout << "Trying with '" << name << "'" << std::endl;
 MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
 );

 std::cout << "Connected!" << std::endl;

 MPI_Finalize();
 return 0;
 }

 I run the server first:
 $ mpirun ./server
 Port name is 2720137217.0:595361386

 Then a second later I run the client:
 $ mpirun ./client 2720137217.0:595361386
 Trying with '2720137217.0:595361386'

 Both programs hang for awhile and then eventually time out.  I have a
 feeling I'm misunderstanding something and doing something dumb but from
 all the examples I've seen online it seems like this should work.

 Thanks for the help.
 -Adam

>>>
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> ___
>> users mailing list
>> 

Re: [OMPI users] MPI_Comm_accept()

2017-05-27 Thread r...@open-mpi.org
Hardly the hoped-for quick turnaround, but it has been fixed in master and will 
go into v3.0, which is planned for release in the near future

> On Mar 14, 2017, at 6:26 PM, Adam Sylvester  > wrote:
> 
> Excellent - I appreciate the quick turnaround.
> 
> On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org 
>  > 
> wrote:
> I don’t see an issue right away, though I know it has been brought up before. 
> I hope to resolve it either this week or next - will reply to this thread 
> with the PR link when ready.
> 
> 
>> On Mar 13, 2017, at 6:16 PM, Adam Sylvester > > wrote:
>> 
>> Bummer - thanks for the update.  I will revert back to 1.10.x for now then.  
>> Should I file a bug report for this on GitHub or elsewhere?  Or if there's 
>> an issue for this already open, can you point me to it so I can keep track 
>> of when it's fixed?  Any best guess calendar-wise as to when you expect this 
>> to be fixed?
>> 
>> Thanks.
>> 
>> On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org 
>>  > 
>> wrote:
>> You should consider it a bug for now - it won’t work in the 2.0 series, and 
>> I don’t think it will work in the upcoming 2.1.0 release. Probably will be 
>> fixed after that.
>> 
>> 
>>> On Mar 13, 2017, at 5:17 AM, Adam Sylvester >> > wrote:
>>> 
>>> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as 
>>> expected (the port formatting looks really different):
>>> 
>>> $ mpirun -np 1 ./server
>>> Port name is 
>>> 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300
>>>  <>
>>> Accepted!
>>> 
>>> $ mpirun -np 1 ./client 
>>> "1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300
>>>  <>"
>>> Trying with 
>>> '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300'
>>>  <>
>>> Connected!
>>> 
>>> I've found some other posts of users asking about similar things regarding 
>>> the 2.x release - is this a bug?
>>> 
>>> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester >> > wrote:
>>> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() / 
>>> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two 
>>> processes running on two machines that don't initially know about each 
>>> other (i.e. I can't do the typical mpirun with a list of IPs); eventually I 
>>> think I may need to use ompi-server to accomplish what I want but for now 
>>> I'm trying to test this out running two processes on the same machine with 
>>> some toy programs.
>>> 
>>> server.cpp creates the port, prints it, and waits for a client to accept 
>>> using it:
>>> 
>>> #include 
>>> #include 
>>> 
>>> int main(int argc, char** argv)
>>> {
>>> MPI_Init(NULL, NULL);
>>> 
>>> char myport[MPI_MAX_PORT_NAME];
>>> MPI_Comm intercomm;
>>> 
>>> MPI_Open_port(MPI_INFO_NULL, myport);
>>> std::cout << "Port name is " << myport << std::endl;
>>> 
>>> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );
>>> 
>>> std::cout << "Accepted!" << std::endl;
>>> 
>>> MPI_Finalize();
>>> return 0;
>>> }
>>> 
>>> client.cpp takes in this port on the command line and tries to connect to 
>>> it:
>>> 
>>> #include 
>>> #include 
>>> 
>>> int main(int argc, char** argv)
>>> {
>>> MPI_Init(NULL, NULL);
>>> 
>>> MPI_Comm intercomm;
>>> 
>>> const std::string name(argv[1]);
>>> std::cout << "Trying with '" << name << "'" << std::endl;
>>> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, 
>>> );
>>> 
>>> std::cout << "Connected!" << std::endl;
>>> 
>>> MPI_Finalize();
>>> return 0;
>>> }
>>> 
>>> I run the server first:
>>> $ mpirun ./server
>>> Port name is 2720137217.0:595361386
>>> 
>>> Then a second later I run the client:
>>> $ mpirun ./client 2720137217.0:595361386
>>> Trying with '2720137217.0:595361386'
>>> 
>>> Both programs hang for awhile and then eventually time out.  I have a 
>>> feeling I'm misunderstanding something and doing something dumb but from 
>>> all the examples I've seen online it seems like this should work.
>>> 
>>> Thanks for the help.
>>> -Adam
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org 
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> 
>> 
>> 

Re: [OMPI users] MPI_Comm_accept()

2017-03-14 Thread Adam Sylvester
Excellent - I appreciate the quick turnaround.

On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org  wrote:

> I don’t see an issue right away, though I know it has been brought up
> before. I hope to resolve it either this week or next - will reply to this
> thread with the PR link when ready.
>
>
> On Mar 13, 2017, at 6:16 PM, Adam Sylvester  wrote:
>
> Bummer - thanks for the update.  I will revert back to 1.10.x for now
> then.  Should I file a bug report for this on GitHub or elsewhere?  Or if
> there's an issue for this already open, can you point me to it so I can
> keep track of when it's fixed?  Any best guess calendar-wise as to when you
> expect this to be fixed?
>
> Thanks.
>
> On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org 
> wrote:
>
>> You should consider it a bug for now - it won’t work in the 2.0 series,
>> and I don’t think it will work in the upcoming 2.1.0 release. Probably will
>> be fixed after that.
>>
>>
>> On Mar 13, 2017, at 5:17 AM, Adam Sylvester  wrote:
>>
>> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
>> expected (the port formatting looks really different):
>>
>> $ mpirun -np 1 ./server
>> Port name is 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10
>> .102.16.135::300
>> Accepted!
>>
>> $ mpirun -np 1 ./client "1286733824.0;tcp://10.102.16.
>> 135:43074+1286733825.0;tcp://10.102.16.135::300"
>> Trying with '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://1
>> 0.102.16.135::300'
>> Connected!
>>
>> I've found some other posts of users asking about similar things
>> regarding the 2.x release - is this a bug?
>>
>> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  wrote:
>>
>>> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() /
>>> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two
>>> processes running on two machines that don't initially know about each
>>> other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
>>> think I may need to use ompi-server to accomplish what I want but for now
>>> I'm trying to test this out running two processes on the same machine with
>>> some toy programs.
>>>
>>> server.cpp creates the port, prints it, and waits for a client to accept
>>> using it:
>>>
>>> #include 
>>> #include 
>>>
>>> int main(int argc, char** argv)
>>> {
>>> MPI_Init(NULL, NULL);
>>>
>>> char myport[MPI_MAX_PORT_NAME];
>>> MPI_Comm intercomm;
>>>
>>> MPI_Open_port(MPI_INFO_NULL, myport);
>>> std::cout << "Port name is " << myport << std::endl;
>>>
>>> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );
>>>
>>> std::cout << "Accepted!" << std::endl;
>>>
>>> MPI_Finalize();
>>> return 0;
>>> }
>>>
>>> client.cpp takes in this port on the command line and tries to connect
>>> to it:
>>>
>>> #include 
>>> #include 
>>>
>>> int main(int argc, char** argv)
>>> {
>>> MPI_Init(NULL, NULL);
>>>
>>> MPI_Comm intercomm;
>>>
>>> const std::string name(argv[1]);
>>> std::cout << "Trying with '" << name << "'" << std::endl;
>>> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
>>> );
>>>
>>> std::cout << "Connected!" << std::endl;
>>>
>>> MPI_Finalize();
>>> return 0;
>>> }
>>>
>>> I run the server first:
>>> $ mpirun ./server
>>> Port name is 2720137217.0:595361386
>>>
>>> Then a second later I run the client:
>>> $ mpirun ./client 2720137217.0:595361386
>>> Trying with '2720137217.0:595361386'
>>>
>>> Both programs hang for awhile and then eventually time out.  I have a
>>> feeling I'm misunderstanding something and doing something dumb but from
>>> all the examples I've seen online it seems like this should work.
>>>
>>> Thanks for the help.
>>> -Adam
>>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_accept()

2017-03-14 Thread r...@open-mpi.org
I don’t see an issue right away, though I know it has been brought up before. I 
hope to resolve it either this week or next - will reply to this thread with 
the PR link when ready.


> On Mar 13, 2017, at 6:16 PM, Adam Sylvester  wrote:
> 
> Bummer - thanks for the update.  I will revert back to 1.10.x for now then.  
> Should I file a bug report for this on GitHub or elsewhere?  Or if there's an 
> issue for this already open, can you point me to it so I can keep track of 
> when it's fixed?  Any best guess calendar-wise as to when you expect this to 
> be fixed?
> 
> Thanks.
> 
> On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org 
>  > 
> wrote:
> You should consider it a bug for now - it won’t work in the 2.0 series, and I 
> don’t think it will work in the upcoming 2.1.0 release. Probably will be 
> fixed after that.
> 
> 
>> On Mar 13, 2017, at 5:17 AM, Adam Sylvester > > wrote:
>> 
>> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as 
>> expected (the port formatting looks really different):
>> 
>> $ mpirun -np 1 ./server
>> Port name is 
>> 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300 
>> <>
>> Accepted!
>> 
>> $ mpirun -np 1 ./client 
>> "1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300
>>  <>"
>> Trying with 
>> '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300'
>>  <>
>> Connected!
>> 
>> I've found some other posts of users asking about similar things regarding 
>> the 2.x release - is this a bug?
>> 
>> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester > > wrote:
>> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() / 
>> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two 
>> processes running on two machines that don't initially know about each other 
>> (i.e. I can't do the typical mpirun with a list of IPs); eventually I think 
>> I may need to use ompi-server to accomplish what I want but for now I'm 
>> trying to test this out running two processes on the same machine with some 
>> toy programs.
>> 
>> server.cpp creates the port, prints it, and waits for a client to accept 
>> using it:
>> 
>> #include 
>> #include 
>> 
>> int main(int argc, char** argv)
>> {
>> MPI_Init(NULL, NULL);
>> 
>> char myport[MPI_MAX_PORT_NAME];
>> MPI_Comm intercomm;
>> 
>> MPI_Open_port(MPI_INFO_NULL, myport);
>> std::cout << "Port name is " << myport << std::endl;
>> 
>> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );
>> 
>> std::cout << "Accepted!" << std::endl;
>> 
>> MPI_Finalize();
>> return 0;
>> }
>> 
>> client.cpp takes in this port on the command line and tries to connect to it:
>> 
>> #include 
>> #include 
>> 
>> int main(int argc, char** argv)
>> {
>> MPI_Init(NULL, NULL);
>> 
>> MPI_Comm intercomm;
>> 
>> const std::string name(argv[1]);
>> std::cout << "Trying with '" << name << "'" << std::endl;
>> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, 
>> );
>> 
>> std::cout << "Connected!" << std::endl;
>> 
>> MPI_Finalize();
>> return 0;
>> }
>> 
>> I run the server first:
>> $ mpirun ./server
>> Port name is 2720137217.0:595361386
>> 
>> Then a second later I run the client:
>> $ mpirun ./client 2720137217.0:595361386
>> Trying with '2720137217.0:595361386'
>> 
>> Both programs hang for awhile and then eventually time out.  I have a 
>> feeling I'm misunderstanding something and doing something dumb but from all 
>> the examples I've seen online it seems like this should work.
>> 
>> Thanks for the help.
>> -Adam
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread Adam Sylvester
Bummer - thanks for the update.  I will revert back to 1.10.x for now
then.  Should I file a bug report for this on GitHub or elsewhere?  Or if
there's an issue for this already open, can you point me to it so I can
keep track of when it's fixed?  Any best guess calendar-wise as to when you
expect this to be fixed?

Thanks.

On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org  wrote:

> You should consider it a bug for now - it won’t work in the 2.0 series,
> and I don’t think it will work in the upcoming 2.1.0 release. Probably will
> be fixed after that.
>
>
> On Mar 13, 2017, at 5:17 AM, Adam Sylvester  wrote:
>
> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
> expected (the port formatting looks really different):
>
> $ mpirun -np 1 ./server
> Port name is 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://
> 10.102.16.135::300
> Accepted!
>
> $ mpirun -np 1 ./client "1286733824.0;tcp://10.102.16.
> 135:43074+1286733825.0;tcp://10.102.16.135::300"
> Trying with '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://
> 10.102.16.135::300'
> Connected!
>
> I've found some other posts of users asking about similar things regarding
> the 2.x release - is this a bug?
>
> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  wrote:
>
>> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() /
>> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two
>> processes running on two machines that don't initially know about each
>> other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
>> think I may need to use ompi-server to accomplish what I want but for now
>> I'm trying to test this out running two processes on the same machine with
>> some toy programs.
>>
>> server.cpp creates the port, prints it, and waits for a client to accept
>> using it:
>>
>> #include 
>> #include 
>>
>> int main(int argc, char** argv)
>> {
>> MPI_Init(NULL, NULL);
>>
>> char myport[MPI_MAX_PORT_NAME];
>> MPI_Comm intercomm;
>>
>> MPI_Open_port(MPI_INFO_NULL, myport);
>> std::cout << "Port name is " << myport << std::endl;
>>
>> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );
>>
>> std::cout << "Accepted!" << std::endl;
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> client.cpp takes in this port on the command line and tries to connect to
>> it:
>>
>> #include 
>> #include 
>>
>> int main(int argc, char** argv)
>> {
>> MPI_Init(NULL, NULL);
>>
>> MPI_Comm intercomm;
>>
>> const std::string name(argv[1]);
>> std::cout << "Trying with '" << name << "'" << std::endl;
>> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
>> );
>>
>> std::cout << "Connected!" << std::endl;
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> I run the server first:
>> $ mpirun ./server
>> Port name is 2720137217.0:595361386
>>
>> Then a second later I run the client:
>> $ mpirun ./client 2720137217.0:595361386
>> Trying with '2720137217.0:595361386'
>>
>> Both programs hang for awhile and then eventually time out.  I have a
>> feeling I'm misunderstanding something and doing something dumb but from
>> all the examples I've seen online it seems like this should work.
>>
>> Thanks for the help.
>> -Adam
>>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread r...@open-mpi.org
You should consider it a bug for now - it won’t work in the 2.0 series, and I 
don’t think it will work in the upcoming 2.1.0 release. Probably will be fixed 
after that.


> On Mar 13, 2017, at 5:17 AM, Adam Sylvester  wrote:
> 
> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as expected 
> (the port formatting looks really different):
> 
> $ mpirun -np 1 ./server
> Port name is 
> 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300
> Accepted!
> 
> $ mpirun -np 1 ./client 
> "1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300"
> Trying with 
> '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300'
> Connected!
> 
> I've found some other posts of users asking about similar things regarding 
> the 2.x release - is this a bug?
> 
> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  > wrote:
> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() / 
> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two 
> processes running on two machines that don't initially know about each other 
> (i.e. I can't do the typical mpirun with a list of IPs); eventually I think I 
> may need to use ompi-server to accomplish what I want but for now I'm trying 
> to test this out running two processes on the same machine with some toy 
> programs.
> 
> server.cpp creates the port, prints it, and waits for a client to accept 
> using it:
> 
> #include 
> #include 
> 
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
> 
> char myport[MPI_MAX_PORT_NAME];
> MPI_Comm intercomm;
> 
> MPI_Open_port(MPI_INFO_NULL, myport);
> std::cout << "Port name is " << myport << std::endl;
> 
> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );
> 
> std::cout << "Accepted!" << std::endl;
> 
> MPI_Finalize();
> return 0;
> }
> 
> client.cpp takes in this port on the command line and tries to connect to it:
> 
> #include 
> #include 
> 
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
> 
> MPI_Comm intercomm;
> 
> const std::string name(argv[1]);
> std::cout << "Trying with '" << name << "'" << std::endl;
> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, 
> );
> 
> std::cout << "Connected!" << std::endl;
> 
> MPI_Finalize();
> return 0;
> }
> 
> I run the server first:
> $ mpirun ./server
> Port name is 2720137217.0:595361386
> 
> Then a second later I run the client:
> $ mpirun ./client 2720137217.0:595361386
> Trying with '2720137217.0:595361386'
> 
> Both programs hang for awhile and then eventually time out.  I have a feeling 
> I'm misunderstanding something and doing something dumb but from all the 
> examples I've seen online it seems like this should work.
> 
> Thanks for the help.
> -Adam
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread Adam Sylvester
As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
expected (the port formatting looks really different):

$ mpirun -np 1 ./server
Port name is 1286733824.0;tcp://10.102.16.135:43074
+1286733825.0;tcp://10.102.16.135::300
Accepted!

$ mpirun -np 1 ./client "1286733824.0;tcp://10.102.16.135:43074
+1286733825.0;tcp://10.102.16.135::300"
Trying with '1286733824.0;tcp://10.102.16.135:43074
+1286733825.0;tcp://10.102.16.135::300'
Connected!

I've found some other posts of users asking about similar things regarding
the 2.x release - is this a bug?

On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  wrote:

> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() /
> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two
> processes running on two machines that don't initially know about each
> other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
> think I may need to use ompi-server to accomplish what I want but for now
> I'm trying to test this out running two processes on the same machine with
> some toy programs.
>
> server.cpp creates the port, prints it, and waits for a client to accept
> using it:
>
> #include 
> #include 
>
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
>
> char myport[MPI_MAX_PORT_NAME];
> MPI_Comm intercomm;
>
> MPI_Open_port(MPI_INFO_NULL, myport);
> std::cout << "Port name is " << myport << std::endl;
>
> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );
>
> std::cout << "Accepted!" << std::endl;
>
> MPI_Finalize();
> return 0;
> }
>
> client.cpp takes in this port on the command line and tries to connect to
> it:
>
> #include 
> #include 
>
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
>
> MPI_Comm intercomm;
>
> const std::string name(argv[1]);
> std::cout << "Trying with '" << name << "'" << std::endl;
> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
> );
>
> std::cout << "Connected!" << std::endl;
>
> MPI_Finalize();
> return 0;
> }
>
> I run the server first:
> $ mpirun ./server
> Port name is 2720137217.0:595361386
>
> Then a second later I run the client:
> $ mpirun ./client 2720137217.0:595361386
> Trying with '2720137217.0:595361386'
>
> Both programs hang for awhile and then eventually time out.  I have a
> feeling I'm misunderstanding something and doing something dumb but from
> all the examples I've seen online it seems like this should work.
>
> Thanks for the help.
> -Adam
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] MPI_Comm_accept()

2017-03-12 Thread Adam Sylvester
I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() /
MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two
processes running on two machines that don't initially know about each
other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
think I may need to use ompi-server to accomplish what I want but for now
I'm trying to test this out running two processes on the same machine with
some toy programs.

server.cpp creates the port, prints it, and waits for a client to accept
using it:

#include 
#include 

int main(int argc, char** argv)
{
MPI_Init(NULL, NULL);

char myport[MPI_MAX_PORT_NAME];
MPI_Comm intercomm;

MPI_Open_port(MPI_INFO_NULL, myport);
std::cout << "Port name is " << myport << std::endl;

MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, );

std::cout << "Accepted!" << std::endl;

MPI_Finalize();
return 0;
}

client.cpp takes in this port on the command line and tries to connect to
it:

#include 
#include 

int main(int argc, char** argv)
{
MPI_Init(NULL, NULL);

MPI_Comm intercomm;

const std::string name(argv[1]);
std::cout << "Trying with '" << name << "'" << std::endl;
MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
);

std::cout << "Connected!" << std::endl;

MPI_Finalize();
return 0;
}

I run the server first:
$ mpirun ./server
Port name is 2720137217.0:595361386

Then a second later I run the client:
$ mpirun ./client 2720137217.0:595361386
Trying with '2720137217.0:595361386'

Both programs hang for awhile and then eventually time out.  I have a
feeling I'm misunderstanding something and doing something dumb but from
all the examples I've seen online it seems like this should work.

Thanks for the help.
-Adam
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-14 Thread Ralph Castain
Kewl - thanks for the assist!

Yes, I’ll add it now - waiting for one other problem to be resolved (patch is 
in the oven), then expect to release tomorrow barring any other problems.


> On Jul 14, 2015, at 9:18 AM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
> wrote:
> 
> Yes, this patch applied over OpenMPI 1.8.6 solves my problem.
> 
> Attached are the new output files for the server and the client when started 
> with "--mca oob_base_verbose 100".
> 
> Will this patch be included in 1.8.7 ?
> 
> Thanks again,
> 
> Martin Audet
> 
> From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Tuesday, July 14, 2015 11:10 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_Comm_accept() /   MPI_Comm_connect()  fail  
>   between two different machines
> 
> This seems to fix the problem when using your example on my cluster - please 
> let me know if it solves things for you
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27277.php



Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-14 Thread Audet, Martin
Yes, this patch applied over OpenMPI 1.8.6 solves my problem.

Attached are the new output files for the server and the client when started 
with "--mca oob_base_verbose 100".

Will this patch be included in 1.8.7 ?

Thanks again,

Martin Audet

From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
[r...@open-mpi.org]
Sent: Tuesday, July 14, 2015 11:10 AM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() /   MPI_Comm_connect()  fail
between two different machines

This seems to fix the problem when using your example on my cluster - please 
let me know if it solves things for you



server_out2.txt.bz2
Description: server_out2.txt.bz2


client_out2.txt.bz2
Description: client_out2.txt.bz2


Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-14 Thread Audet, Martin
I will happily test any patch you send me to fix this problem.

Thanks,

Martin

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: July 13, 2015 22:55
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between 
two different machines

I see the problem - it's a race condition, actually. I'll try to provide a 
patch for you to test, if you don't mind.


> On Jul 13, 2015, at 3:03 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
> wrote:
> 
> Thanks Ralph for this quick response.
> 
> In the two attachements you will find the output I got when running the 
> following commands:
> 
> [audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 
> ./simpleserver 2>&1 | tee server_out.txt
> 
> [audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 
> ./simpleclient 
> '227264.0;tcp://172.17.15.20:56377+227265.0;tcp://172.17.15.20
> :34776:300' 2>&1 | tee client_out.txt
> 
> Martin
> 
> From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Monday, July 13, 2015 5:29 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail   
> between two different machines
> 
> Try running it with "-mca oob_base_verbose 100" on both client and server - 
> it will tell us why the connection was refused.
> 
> 
>> On Jul 13, 2015, at 2:14 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
>> wrote:
>> 
>> Hi OMPI_Developers,
>> 
>> It seems that I am unable to establish an MPI communication between two 
>> independently started MPI programs using the simplest client/server call 
>> sequence I can imagine (see the two attached files) when the client and 
>> server process are started on different machines. Note that I have no 
>> problems when the client and server program run on the same machine.
>> 
>> For example if I do the following on the server machine (running on fn1):
>> 
>> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
>> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver Server port = 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> 
>> The server prints its port (created with MPI_Open_port()) and wait for a 
>> connection by calling MPI_Comm_accept().
>> 
>> Now on the client machine (running on linux15) if I compile the client and 
>> run it with the above port address on the command line, I get:
>> 
>> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
>> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> trying to connect...
>> 
>> A process or daemon was unable to complete a TCP connection to 
>> another process:
>> Local host:linux15
>> Remote host:   linux15
>> This is usually caused by a firewall on the remote host. Please check 
>> that any firewall (e.g., iptables) has been disabled and try again.
>> 
>> [linux15:24193] [[13075,0],0]-[[46606,0],0] 
>> mca_oob_tcp_peer_send_handler: invalid connection state (6) on socket 
>> 16
>> 
>> And then I have to stop the client program by pressing ^C (and also the 
>> server which doesn't seems affected).
>> 
>> What's wrong ?
>> 
>> And I am almost sure there is no firewall running on linux15.
>> 
>> It is not the first MPI client/server application I am developing (with both 
>> OpenMPI and mpich).
>> These simple MPI client/server programs work well with mpich (version 3.1.3).
>> 
>> This problem happens with both OpenMPI 1.8.3 and 1.8.6
>> 
>> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
>> by a Gigabit Ethernet (the normal network).
>> 
>> And again if client and server run on the same machine (either fn1 or 
>> linux15) no such problems happens.
>> 
>> Thanks in advance,
>> 
>> Martin 
>> Audet
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/07/27271.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-13 Thread Ralph Castain
I see the problem - it’s a race condition, actually. I’ll try to provide a 
patch for you to test, if you don’t mind.


> On Jul 13, 2015, at 3:03 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
> wrote:
> 
> Thanks Ralph for this quick response.
> 
> In the two attachements you will find the output I got when running the 
> following commands:
> 
> [audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleserver 2>&1 
> | tee server_out.txt
> 
> [audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleclient 
> '227264.0;tcp://172.17.15.20:56377+227265.0;tcp://172.17.15.20:34776:300'
>  2>&1 | tee client_out.txt
> 
> Martin
> 
> From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Monday, July 13, 2015 5:29 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail   
> between two different machines
> 
> Try running it with “—mca oob_base_verbose 100” on both client and server - 
> it will tell us why the connection was refused.
> 
> 
>> On Jul 13, 2015, at 2:14 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
>> wrote:
>> 
>> Hi OMPI_Developers,
>> 
>> It seems that I am unable to establish an MPI communication between two 
>> independently started MPI programs using the simplest client/server call 
>> sequence I can imagine (see the two attached files) when the client and 
>> server process are started on different machines. Note that I have no 
>> problems when the client and server program run on the same machine.
>> 
>> For example if I do the following on the server machine (running on fn1):
>> 
>> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
>> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
>> Server port = 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> 
>> The server prints its port (created with MPI_Open_port()) and wait for a 
>> connection by calling MPI_Comm_accept().
>> 
>> Now on the client machine (running on linux15) if I compile the client and 
>> run it with the above port address on the command line, I get:
>> 
>> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
>> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> trying to connect...
>> 
>> A process or daemon was unable to complete a TCP connection
>> to another process:
>> Local host:linux15
>> Remote host:   linux15
>> This is usually caused by a firewall on the remote host. Please
>> check that any firewall (e.g., iptables) has been disabled and
>> try again.
>> 
>> [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
>> invalid connection state (6) on socket 16
>> 
>> And then I have to stop the client program by pressing ^C (and also the 
>> server which doesn't seems affected).
>> 
>> What's wrong ?
>> 
>> And I am almost sure there is no firewall running on linux15.
>> 
>> It is not the first MPI client/server application I am developing (with both 
>> OpenMPI and mpich).
>> These simple MPI client/server programs work well with mpich (version 3.1.3).
>> 
>> This problem happens with both OpenMPI 1.8.3 and 1.8.6
>> 
>> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
>> by a Gigabit Ethernet (the normal network).
>> 
>> And again if client and server run on the same machine (either fn1 or 
>> linux15) no such problems happens.
>> 
>> Thanks in advance,
>> 
>> Martin 
>> Audet___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/07/27271.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27272.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27273.php



Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-13 Thread Audet, Martin
Thanks Ralph for this quick response.

In the two attachements you will find the output I got when running the 
following commands:

[audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleserver 2>&1 | 
tee server_out.txt

[audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleclient 
'227264.0;tcp://172.17.15.20:56377+227265.0;tcp://172.17.15.20:34776:300'
 2>&1 | tee client_out.txt

Martin

From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
[r...@open-mpi.org]
Sent: Monday, July 13, 2015 5:29 PM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail   between 
two different machines

Try running it with “—mca oob_base_verbose 100” on both client and server - it 
will tell us why the connection was refused.


> On Jul 13, 2015, at 2:14 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
> wrote:
>
> Hi OMPI_Developers,
>
> It seems that I am unable to establish an MPI communication between two 
> independently started MPI programs using the simplest client/server call 
> sequence I can imagine (see the two attached files) when the client and 
> server process are started on different machines. Note that I have no 
> problems when the client and server program run on the same machine.
>
> For example if I do the following on the server machine (running on fn1):
>
> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
> Server port = 
> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>
> The server prints its port (created with MPI_Open_port()) and wait for a 
> connection by calling MPI_Comm_accept().
>
> Now on the client machine (running on linux15) if I compile the client and 
> run it with the above port address on the command line, I get:
>
> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
> trying to connect...
> 
> A process or daemon was unable to complete a TCP connection
> to another process:
>  Local host:linux15
>  Remote host:   linux15
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> 
> [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
> invalid connection state (6) on socket 16
>
> And then I have to stop the client program by pressing ^C (and also the 
> server which doesn't seems affected).
>
> What's wrong ?
>
> And I am almost sure there is no firewall running on linux15.
>
> It is not the first MPI client/server application I am developing (with both 
> OpenMPI and mpich).
> These simple MPI client/server programs work well with mpich (version 3.1.3).
>
> This problem happens with both OpenMPI 1.8.3 and 1.8.6
>
> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
> by a Gigabit Ethernet (the normal network).
>
> And again if client and server run on the same machine (either fn1 or 
> linux15) no such problems happens.
>
> Thanks in advance,
>
> Martin 
> Audet___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27271.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27272.php
[fn1:07315] mca: base: components_register: registering oob components
[fn1:07315] mca: base: components_register: found loaded component tcp
[fn1:07315] mca: base: components_register: component tcp register function 
successful
[fn1:07315] mca: base: components_open: opening oob components
[fn1:07315] mca: base: components_open: found loaded component tcp
[fn1:07315] mca: base: components_open: component tcp open function successful
[fn1:07315] mca:oob:select: checking available component tcp
[fn1:07315] mca:oob:select: Querying component [tcp]
[fn1:07315] oob:tcp: component_available called
[fn1:07315] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[fn1:07315] [[37299,0],0] oob:tcp:init rejecting loopback interface lo
[fn1:07315] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[fn1:07315] [[37299,0],0] oob:tcp:init adding 172.17.15.20 to our list of V4 
connections
[fn1:07315] [[37299,0],0] TCP STARTUP
[fn1:07315] [[

Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-13 Thread Ralph Castain
Try running it with “—mca oob_base_verbose 100” on both client and server - it 
will tell us why the connection was refused.


> On Jul 13, 2015, at 2:14 PM, Audet, Martin  
> wrote:
> 
> Hi OMPI_Developers,
> 
> It seems that I am unable to establish an MPI communication between two 
> independently started MPI programs using the simplest client/server call 
> sequence I can imagine (see the two attached files) when the client and 
> server process are started on different machines. Note that I have no 
> problems when the client and server program run on the same machine.
> 
> For example if I do the following on the server machine (running on fn1):
> 
> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
> Server port = 
> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
> 
> The server prints its port (created with MPI_Open_port()) and wait for a 
> connection by calling MPI_Comm_accept().
> 
> Now on the client machine (running on linux15) if I compile the client and 
> run it with the above port address on the command line, I get:
> 
> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
> trying to connect...
> 
> A process or daemon was unable to complete a TCP connection
> to another process:
>  Local host:linux15
>  Remote host:   linux15
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> 
> [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
> invalid connection state (6) on socket 16
> 
> And then I have to stop the client program by pressing ^C (and also the 
> server which doesn't seems affected).
> 
> What's wrong ?
> 
> And I am almost sure there is no firewall running on linux15.
> 
> It is not the first MPI client/server application I am developing (with both 
> OpenMPI and mpich).
> These simple MPI client/server programs work well with mpich (version 3.1.3).
> 
> This problem happens with both OpenMPI 1.8.3 and 1.8.6
> 
> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
> by a Gigabit Ethernet (the normal network).
> 
> And again if client and server run on the same machine (either fn1 or 
> linux15) no such problems happens.
> 
> Thanks in advance,
> 
> Martin 
> Audet___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27271.php



[OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-13 Thread Audet, Martin
Hi OMPI_Developers,

It seems that I am unable to establish an MPI communication between two 
independently started MPI programs using the simplest client/server call 
sequence I can imagine (see the two attached files) when the client and server 
process are started on different machines. Note that I have no problems when 
the client and server program run on the same machine.

For example if I do the following on the server machine (running on fn1):

[audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
[audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
Server port = 
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'

The server prints its port (created with MPI_Open_port()) and wait for a 
connection by calling MPI_Comm_accept().

Now on the client machine (running on linux15) if I compile the client and run 
it with the above port address on the command line, I get:

[audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
[audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
trying to connect...

A process or daemon was unable to complete a TCP connection
to another process:
  Local host:linux15
  Remote host:   linux15
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.

[linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
invalid connection state (6) on socket 16

And then I have to stop the client program by pressing ^C (and also the server 
which doesn't seems affected).

What's wrong ?

And I am almost sure there is no firewall running on linux15.

It is not the first MPI client/server application I am developing (with both 
OpenMPI and mpich).
These simple MPI client/server programs work well with mpich (version 3.1.3).

This problem happens with both OpenMPI 1.8.3 and 1.8.6

linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected by 
a Gigabit Ethernet (the normal network).

And again if client and server run on the same machine (either fn1 or linux15) 
no such problems happens.

Thanks in advance,

Martin Audet#include 
#include 

#include 

int main(int argc, char **argv)
{
   int   comm_rank;
   char  port_name[MPI_MAX_PORT_NAME];
   MPI_Comm intercomm;
   int  ok_flag;

   MPI_Init(, );

   MPI_Comm_rank(MPI_COMM_WORLD, _rank);

   ok_flag = (comm_rank != 0) || (argc == 1);
   MPI_Bcast(_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s\n",argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   MPI_Open_port(MPI_INFO_NULL, port_name);

   if (comm_rank == 0) {
  printf("Server port = '%s'\n", port_name);
   }
   MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );

   MPI_Close_port(port_name);

   if (comm_rank == 0) {
  printf("MPI_Comm_accept() sucessful...\n");
   }

   MPI_Comm_disconnect();

   MPI_Finalize();

   return EXIT_SUCCESS;
}
#include 
#include 
#include 

#include 

int main(int argc, char **argv)
{
   int  comm_rank;
   int  ok_flag;
   MPI_Comm intercomm;

   MPI_Init(, );

   MPI_Comm_rank(MPI_COMM_WORLD, _rank);

   ok_flag = (comm_rank != 0)  || ((argc == 2)  &&  argv[1]  &&  (*argv[1] != '\0'));
   MPI_Bcast(_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s mpi_port\n", argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   if (comm_rank == 0) {
  printf("trying to connect...\n");
   }
   while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, MPI_INFO_NULL, 0, MPI_COMM_WORLD, ) != MPI_SUCCESS) {
  if (comm_rank == 0) {
 printf("MPI_Comm_connect() failled, sleeping and retrying...\n");
  }
  sleep(1);
   }
   if (comm_rank == 0) {
  printf("MPI_Comm_connect() sucessful...\n");
   }

   MPI_Comm_disconnect();

   MPI_Finalize();

   return EXIT_SUCCESS;
}


Re: [OMPI users] MPI_Comm_accept randomly gives errors

2012-10-16 Thread Valentin Clement
Thanks for the information. As there is also a problem with intercommunicator 
after spawn in 1.6.x I'll turn to another solution. 

Valentin Clément 


On Oct 15, 2012, at 10:41 PM, Ralph Castain  wrote:

> Yeah, we don't support multi-threaded operations very well at this time. I 
> think you'd have better success with the 1.7 series as it is released, but 
> very much doubt the 1.6 series could do this as you describe.
> 
> One way to solve the immediate problem would be to funnel all MPI operations 
> into a single thread - you can have that thread subsequently parcel out any 
> messages for handling. You'd have better success with it.
> 
> 
> On Oct 3, 2012, at 10:36 PM, Valentin Clement  
> wrote:
> 
>> Hi everyone, 
>> 
>> I'm currently implementing communication based on MPI in our parallel 
>> language middle-ware POP-C++. It was using TCP/IP socket before but due to a 
>> project to port the language on a supercomputer, I have to use OpenMPI for 
>> the communication. I successfully change the old communication by MPI 
>> communication. Anyway I having the following error sometimes during the 
>> execution of my program. 
>> 
>> MPI-COMBOX(client): Want to get a connection to 
>> 3461939200.0;tcp://172.19.76.219:52876;tcp://172.19.7.128:52876;tcp://172.16.162.1:52876;tcp://192.168.59.1:52876+3461939202.0;tcp://172.19.76.219:52879;tcp://172.19.7.128:52879;tcp://172.16.162.1:52879;tcp://192.168.59.1:52879:300
>> [clementon:58465] [[52825,3],0] ORTE_ERROR_LOG: Data unpack would read past 
>> end of buffer in file dpm_orte.c at line 315
>> [clementon:58465] *** An error occurred in MPI_Comm_accept
>> [clementon:58465] *** on communicator MPI_COMM_WORLD
>> [clementon:58465] *** MPI_ERR_UNKNOWN: unknown error
>> [clementon:58465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> 
>> Sometimes I have to MPI_Comm_connect that failed :
>> 
>> MPI-COMBOX(client): Want to get a connection to 
>> 1318912000.0;tcp://192.168.59.176:33956+1318912002.0;tcp://192.168.59.176:54394:300
>> [ubuntu:19666] [[20125,3],0] ORTE_ERROR_LOG: Data unpack would read past end 
>> of buffer in file dpm_orte.c at line 315
>> [ubuntu:19666] *** An error occurred in MPI_Comm_accept
>> [ubuntu:19666] *** on communicator MPI_COMM_WORLD
>> [ubuntu:19666] *** MPI_ERR_UNKNOWN: unknown error
>> [ubuntu:19666] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> 
>> So basically, I have a process waiting for connection with MPI_Comm_accept 
>> (Comm.Accept as I used C++). And another process want to connect to it with 
>> the MPI_Comm_connect (MPI::COMM_WORLD.Connect(port_name) ... ). It works 
>> fine most of the time. I'm suspecting a problem with multiple threads. The 
>> process who receives connection as a second thread to serve request. 
>> 
>> * The process 1 connects to the process 2 
>> * process 2 thread 1 register the request
>> * process 2 thread 1 will wait for a new connection
>> * process 2 thread 2 will server the pending request and might send data
>> * A another process might start again a connection to the process 2
>> 
>> I'm running this code on an Ubuntu 12.04 with OpenMPI 1.6.2 configured with 
>> --enable-mpi-thread-multiple. I joined ompi_info -all output. 
>> I'm running also the same code on a Mac OS X 10.8.2 with OpenMPI 1.6.2 also 
>> configured with --enable-mpi-thread-multiple. 
>> 
>> I don't run on multiple node for the moment. Just one node and already 
>> experiencing this. As I said I'm suspecting a problem with multiple thread 
>> but my configuration should allow multiple thread to use MPI calls. 
>> 
>> 
>> 
>> Any help much appreciated 
>> 
>> 
>> 
>> Valentin Clement
>> 
>> --
>> Valentin Clement
>> Student trainee
>> Advanced Institute for Computational Science
>> Programming environnement research team 
>> RIKEN Institute
>> Kobe, Japan
>> 
>> 
>> 
>>  
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-
Valentin Clement - Student trainee at AICS - RIKEN
valentin.clem...@hefr.ch
valentin.clem...@riken.jp
Master thesis project
POP-C++ on the K Computer 
Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
-



Re: [OMPI users] MPI_Comm_accept randomly gives errors

2012-10-15 Thread Ralph Castain
Yeah, we don't support multi-threaded operations very well at this time. I 
think you'd have better success with the 1.7 series as it is released, but very 
much doubt the 1.6 series could do this as you describe.

One way to solve the immediate problem would be to funnel all MPI operations 
into a single thread - you can have that thread subsequently parcel out any 
messages for handling. You'd have better success with it.


On Oct 3, 2012, at 10:36 PM, Valentin Clement  wrote:

> Hi everyone, 
> 
> I'm currently implementing communication based on MPI in our parallel 
> language middle-ware POP-C++. It was using TCP/IP socket before but due to a 
> project to port the language on a supercomputer, I have to use OpenMPI for 
> the communication. I successfully change the old communication by MPI 
> communication. Anyway I having the following error sometimes during the 
> execution of my program. 
> 
> MPI-COMBOX(client): Want to get a connection to 
> 3461939200.0;tcp://172.19.76.219:52876;tcp://172.19.7.128:52876;tcp://172.16.162.1:52876;tcp://192.168.59.1:52876+3461939202.0;tcp://172.19.76.219:52879;tcp://172.19.7.128:52879;tcp://172.16.162.1:52879;tcp://192.168.59.1:52879:300
> [clementon:58465] [[52825,3],0] ORTE_ERROR_LOG: Data unpack would read past 
> end of buffer in file dpm_orte.c at line 315
> [clementon:58465] *** An error occurred in MPI_Comm_accept
> [clementon:58465] *** on communicator MPI_COMM_WORLD
> [clementon:58465] *** MPI_ERR_UNKNOWN: unknown error
> [clementon:58465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> 
> Sometimes I have to MPI_Comm_connect that failed :
> 
> MPI-COMBOX(client): Want to get a connection to 
> 1318912000.0;tcp://192.168.59.176:33956+1318912002.0;tcp://192.168.59.176:54394:300
> [ubuntu:19666] [[20125,3],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file dpm_orte.c at line 315
> [ubuntu:19666] *** An error occurred in MPI_Comm_accept
> [ubuntu:19666] *** on communicator MPI_COMM_WORLD
> [ubuntu:19666] *** MPI_ERR_UNKNOWN: unknown error
> [ubuntu:19666] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> 
> So basically, I have a process waiting for connection with MPI_Comm_accept 
> (Comm.Accept as I used C++). And another process want to connect to it with 
> the MPI_Comm_connect (MPI::COMM_WORLD.Connect(port_name) ... ). It works fine 
> most of the time. I'm suspecting a problem with multiple threads. The process 
> who receives connection as a second thread to serve request. 
> 
> * The process 1 connects to the process 2 
> * process 2 thread 1 register the request
> * process 2 thread 1 will wait for a new connection
> * process 2 thread 2 will server the pending request and might send data
> * A another process might start again a connection to the process 2
> 
> I'm running this code on an Ubuntu 12.04 with OpenMPI 1.6.2 configured with 
> --enable-mpi-thread-multiple. I joined ompi_info -all output. 
> I'm running also the same code on a Mac OS X 10.8.2 with OpenMPI 1.6.2 also 
> configured with --enable-mpi-thread-multiple. 
> 
> I don't run on multiple node for the moment. Just one node and already 
> experiencing this. As I said I'm suspecting a problem with multiple thread 
> but my configuration should allow multiple thread to use MPI calls. 
> 
> 
> 
> Any help much appreciated 
> 
> 
> 
> Valentin Clement
> 
> --
> Valentin Clement
> Student trainee
> Advanced Institute for Computational Science
> Programming environnement research team 
> RIKEN Institute
> Kobe, Japan
> 
> 
> 
>  
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] MPI_Comm_accept randomly gives errors

2012-10-04 Thread Valentin Clement
Hi everyone, I'm currently implementing communication based on MPI in our parallel language middle-ware POP-C++. It was using TCP/IP socket before but due to a project to port the language on a supercomputer, I have to use OpenMPI for the communication. I successfully change the old communication by MPI communication. Anyway I having the following error sometimes during the execution of my program. MPI-COMBOX(client): Want to get a connection to 3461939200.0;tcp://172.19.76.219:52876;tcp://172.19.7.128:52876;tcp://172.16.162.1:52876;tcp://192.168.59.1:52876+3461939202.0;tcp://172.19.76.219:52879;tcp://172.19.7.128:52879;tcp://172.16.162.1:52879;tcp://192.168.59.1:52879:300[clementon:58465] [[52825,3],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315[clementon:58465] *** An error occurred in MPI_Comm_accept[clementon:58465] *** on communicator MPI_COMM_WORLD[clementon:58465] *** MPI_ERR_UNKNOWN: unknown error[clementon:58465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abortSometimes I have to MPI_Comm_connect that failed :MPI-COMBOX(client): Want to get a connection to 1318912000.0;tcp://192.168.59.176:33956+1318912002.0;tcp://192.168.59.176:54394:300[ubuntu:19666] [[20125,3],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315[ubuntu:19666] *** An error occurred in MPI_Comm_accept[ubuntu:19666] *** on communicator MPI_COMM_WORLD[ubuntu:19666] *** MPI_ERR_UNKNOWN: unknown error[ubuntu:19666] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abortSo basically, I have a process waiting for connection with MPI_Comm_accept (Comm.Accept as I used C++). And another process want to connect to it with the MPI_Comm_connect (MPI::COMM_WORLD.Connect(port_name) ... ). It works fine most of the time. I'm suspecting a problem with multiple threads. The process who receives connection as a second thread to serve request. * The process 1 connects to the process 2 * process 2 thread 1 register the request* process 2 thread 1 will wait for a new connection* process 2 thread 2 will server the pending request and might send data* A another process might start again a connection to the process 2I'm running this code on an Ubuntu 12.04 with OpenMPI 1.6.2 configured with --enable-mpi-thread-multiple. I joined ompi_info -all output. I'm running also the same code on a Mac OS X 10.8.2 with OpenMPI 1.6.2 also configured with --enable-mpi-thread-multiple. I don't run on multiple node for the moment. Just one node and already experiencing this. As I said I'm suspecting a problem with multiple thread but my configuration should allow multiple thread to use MPI calls. Any help much appreciated Valentin Clement--Valentin ClementStudent traineeAdvanced Institute for Computational ScienceProgramming environnement research team RIKEN InstituteKobe, Japan
		
	
	
		
			


ompi-output.tar.bz2
Description: BZip2 compressed data
 

			
		

Re: [OMPI users] MPI_Comm_accept - Busy wait

2011-10-14 Thread Thatyene Louise Alves de Souza Ramos
Thank you for the explanation! I use "-mca mpi_yield_when_idle 1" already!

Thank you again!
---
Thatyene Ramos

On Fri, Oct 14, 2011 at 3:43 PM, Ralph Castain  wrote:

> Sorry - been occupied. This is normal behavior. As has been discussed on
> this list before, OMPI made a design decision to minimize latency. This
> means we aggressively poll for connections. Only thing you can do is tell it
> to yield the processor when idle so, if something else is trying to run, we
> will let it get in there a little earlier. Use -mca mpi_yield_when_idle 1
>
> However, we have seen that if no other user processes are trying to run,
> then the scheduler hands the processor right back to you - and you'll still
> see that 100% number. It doesn't mean we are being hogs - it just means that
> nothing else wants to run, so we happily accept the time.
>
>
> On Oct 14, 2011, at 12:21 PM, Thatyene Louise Alves de Souza Ramos wrote:
>
> Does anyone have any idea?
>
> ---
> Thatyene Ramos
>
> On Fri, Oct 7, 2011 at 12:01 PM, Thatyene Louise Alves de Souza Ramos <
> thaty...@gmail.com> wrote:
>
>> Hi there!
>>
>> In my code I use MPI_Comm_accept in a server-client communication. I
>> noticed that the server remains on busy wait whereas waiting for clients
>> connections, using 100% of CPU if there are no other processes running.
>>
>> I wonder if there is any way to prevent this from happening.
>>
>> Thanks in advance.
>>
>> Thatyene Ramos
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] MPI_Comm_accept - Busy wait

2011-10-14 Thread Ralph Castain
Sorry - been occupied. This is normal behavior. As has been discussed on this 
list before, OMPI made a design decision to minimize latency. This means we 
aggressively poll for connections. Only thing you can do is tell it to yield 
the processor when idle so, if something else is trying to run, we will let it 
get in there a little earlier. Use -mca mpi_yield_when_idle 1

However, we have seen that if no other user processes are trying to run, then 
the scheduler hands the processor right back to you - and you'll still see that 
100% number. It doesn't mean we are being hogs - it just means that nothing 
else wants to run, so we happily accept the time.


On Oct 14, 2011, at 12:21 PM, Thatyene Louise Alves de Souza Ramos wrote:

> Does anyone have any idea?
> 
> ---
> Thatyene Ramos
> 
> On Fri, Oct 7, 2011 at 12:01 PM, Thatyene Louise Alves de Souza Ramos 
>  wrote:
> Hi there!
> 
> In my code I use MPI_Comm_accept in a server-client communication. I noticed 
> that the server remains on busy wait whereas waiting for clients connections, 
> using 100% of CPU if there are no other processes running.
> 
> I wonder if there is any way to prevent this from happening.
> 
> Thanks in advance.
> 
> Thatyene Ramos
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] MPI_Comm_accept - Busy wait

2011-10-14 Thread Thatyene Louise Alves de Souza Ramos
Does anyone have any idea?

---
Thatyene Ramos

On Fri, Oct 7, 2011 at 12:01 PM, Thatyene Louise Alves de Souza Ramos <
thaty...@gmail.com> wrote:

> Hi there!
>
> In my code I use MPI_Comm_accept in a server-client communication. I
> noticed that the server remains on busy wait whereas waiting for clients
> connections, using 100% of CPU if there are no other processes running.
>
> I wonder if there is any way to prevent this from happening.
>
> Thanks in advance.
>
> Thatyene Ramos
>


[OMPI users] MPI_Comm_accept - Busy wait

2011-10-07 Thread Thatyene Louise Alves de Souza Ramos
Hi there!

In my code I use MPI_Comm_accept in a server-client communication. I noticed
that the server remains on busy wait whereas waiting for clients
connections, using 100% of CPU if there are no other processes running.

I wonder if there is any way to prevent this from happening.

Thanks in advance.

Thatyene Ramos


Re: [OMPI users] MPI_Comm_accept and MPI_Comm_connect both use 100% one cpu core. Is it a bug?

2010-09-01 Thread Ralph Castain
It's not a bug - that is normal behavior. The processes are polling hard to 
establish the connections as quickly as possible.


On Sep 1, 2010, at 7:24 PM, lyb wrote:

> Hi, All,
> 
> I tested two sample applications on Windows 2003 Server, one use 
> MPI_Comm_accept and other use MPI_Comm_connect, 
> when run into MPI_Comm_accept or MPI_Comm_connect, the application use 100% 
> one cpu core.  Is it a bug or some wrong?
> 
> I tested with three version including Version 1.4 (stable), Version 1.5 
> (prerelease) and trunk 23706 version.
> 
> ...
> MPI_Open_port(MPI_INFO_NULL, port);
> MPI_Comm_accept( port, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );
> ...
> 
> ...
> MPI_Comm_connect( port, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );
> ...
> 
> thanks a lot.
> 
> lyb
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] MPI_Comm_accept and MPI_Comm_connect both use 100% one cpu core. Is it a bug?

2010-09-01 Thread lyb

Hi, All,

I tested two sample applications on Windows 2003 Server, one use 
MPI_Comm_accept and other use MPI_Comm_connect,
when run into MPI_Comm_accept or MPI_Comm_connect, the application use 
100% one cpu core.  Is it a bug or some wrong?


I tested with three version including Version 1.4 (stable) 
, Version 1.5 (prerelease) 
 and trunk 23706 version.


...
MPI_Open_port(MPI_INFO_NULL, port);
MPI_Comm_accept( port, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  );
...

...
MPI_Comm_connect( port, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  );
...

thanks a lot.

lyb







Re: [OMPI users] MPI_Comm_accept() busy waiting?

2010-03-09 Thread Douglas Guptill
On Tue, Mar 09, 2010 at 05:43:02PM +0100, Ramon wrote:
> Am I the only one experiencing such problem?  Is there any solution?

No, you are not the only one.  Several others have mentioned the "busy
wait" problem.

The response on the OpenMPI developers, as I understand it, is that
the MPI job should be the only one running, so a 100% busy wait is not
a problem.  I hope the OpenMPI developers will correct me if I have
mis-stated their position.

I posted my cure for the problem some time ago.  I have attached it
again to this message.

Hope that helps,
Douglas.


> Ramon wrote:
>> Hi,
>>
>> I've recently been trying to develop a client-server distributed file  
>> system (for my thesis) using the MPI.  The communication between the  
>> machines is working great, however when ever the MPI_Comm_accept()  
>> function is called, the server starts like consuming 100% of the CPU.
>>
>> One interesting thing is that I tried to compile the same code using  
>> the LAM/MPI library and the mentioned behaviour could not be observed.
>>
>> Is this a bug?
>>
>> On a side note, I'm using Ubuntu 9.10's default OpenMPI deb package.   
>> Its version is 1.3.2.
>>
>> Regards
>>
>> Ramon.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
  Douglas Guptill   voice: 902-461-9749
  Research Assistant, LSC 4640  email: douglas.gupt...@dal.ca
  Oceanography Department   fax:   902-494-3877
  Dalhousie University
  Halifax, NS, B3H 4J1, Canada

/*
 * Intercept MPI_Recv, and
 * call PMPI_Irecv, loop over PMPI_Request_get_status and sleep, until done
 *
 * Revision History:
 *  2008-12-17: copied from MPI_Send.c
 *  2008-12-18: tweaking.
 *
 * See MPI_Send.c for additional comments, 
 *  especially w.r.t. PMPI_Request_get_status.
 **/

#include "mpi.h"
#define _POSIX_C_SOURCE 199309 
#include 

int MPI_Recv(void *buff, int count, MPI_Datatype datatype, 
	  int from, int tag, MPI_Comm comm, MPI_Status *status) {

  int flag, nsec_start=1000, nsec_max=10;
  struct timespec ts;
  MPI_Request req;

  ts.tv_sec = 0;
  ts.tv_nsec = nsec_start;

  PMPI_Irecv(buff, count, datatype, from, tag, comm, );
  do {
nanosleep(, NULL);
ts.tv_nsec *= 2;
ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec;
PMPI_Request_get_status(req, , status);
  } while (!flag);

  return (*status).MPI_ERROR;
}
/*
 * Intercept MPI_Send, and
 * call PMPI_Isend, loop over PMPI_Request_get_status and sleep, until done
 *
 * Revision History:
 *  2008-12-12: skeleton by Jeff Squyres 
 *  2008-12-16->18: adding parameters, variable wait, 
 * change MPI_Test to MPI_Request_get_status
 *  Douglas Guptill 
 **/

/* When we use this:
 *   PMPI_Test(, , ); 
 * we get:
 * dguptill@DOME:$ mpirun -np 2 mpi_send_recv_test_mine
 * This is process0  of2 .
 * This is process1  of2 .
 * error: proc0 ,mpi_send returned -1208109376
 * error: proc1 ,mpi_send returned -1208310080
 * 1 changed to3
 *
 * Using MPI_request_get_status cures the problem.
 *
 * A read of mpi21-report.pdf confirms that MPI_Request_get_status
 * is the appropriate choice, since there seems to be something
 * between the call to MPI_SEND (MPI_RECV) in my FORTRAN program
 * and MPI_Send.c (MPI_Recv.c)
 **/


#include "mpi.h"
#define _POSIX_C_SOURCE 199309 
#include 

int MPI_Send(void *buff, int count, MPI_Datatype datatype, 
	  int dest, int tag, MPI_Comm comm) {

  int flag, nsec_start=1000, nsec_max=10;
  struct timespec ts;
  MPI_Request req;
  MPI_Status status;

  ts.tv_sec = 0;
  ts.tv_nsec = nsec_start;

  PMPI_Isend(buff, count, datatype, dest, tag, comm, );
  do {
nanosleep(, NULL);
ts.tv_nsec *= 2;
ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec;
PMPI_Request_get_status(req, , );
  } while (!flag);

  return status.MPI_ERROR;
}


Re: [OMPI users] MPI_Comm_accept() busy waiting?

2010-03-09 Thread Ramon
Am I the only one experiencing such problem?  Is there any solution?  Or 
shall I downgrade to LAM/MPI?


Regards

Ramon.


Ramon wrote:

Hi,

I've recently been trying to develop a client-server distributed file 
system (for my thesis) using the MPI.  The communication between the 
machines is working great, however when ever the MPI_Comm_accept() 
function is called, the server starts like consuming 100% of the CPU.


One interesting thing is that I tried to compile the same code using 
the LAM/MPI library and the mentioned behaviour could not be observed.


Is this a bug?

On a side note, I'm using Ubuntu 9.10's default OpenMPI deb package.  
Its version is 1.3.2.


Regards

Ramon.


[OMPI users] MPI_Comm_accept() busy waiting?

2010-03-02 Thread Ramon

Hi,

I've recently been trying to develop a client-server distributed file 
system (for my thesis) using the MPI.  The communication between the 
machines is working great, however when ever the MPI_Comm_accept() 
function is called, the server starts like consuming 100% of the CPU.


One interesting thing is that I tried to compile the same code using the 
LAM/MPI library and the mentioned behaviour could not be observed.


Is this a bug?

On a side note, I'm using Ubuntu 9.10's default OpenMPI deb package.  
Its version is 1.3.2.


Regards

Ramon.


Re: [OMPI users] MPI_Comm_accept()/connect() errors

2009-10-08 Thread Blesson Varghese
The PATH variable contains
/home/hx019035/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/
bin:/usr/games:/usr/local/maui/bin/:

/home/hx019035/bin contains the local installation of OMPI 1.3.3



The LD_LIBRARY_PATH variable contains /home/hx019035/lib:



These variables are being set in the .profile file on the hpcc00 node. 



Would there be a change anywhere else?





From: Ralph Castain [mailto:rhc.open...@gmail.com] On Behalf Of Ralph
Castain
Sent: 07 October 2009 13:32
To: Blesson Varghese
Subject: Re: [OMPI users] MPI_Comm_accept()/connect() errors



Yes, it does. But the error message indicates a 1.2 version is running on
hpcc00.



On Oct 7, 2009, at 5:46 AM, Blesson Varghese wrote:



Just a quick question. Would mpirun -version give me the version of the
mpirun being executed? I am getting the result of that as 1.3.3.



From: Ralph Castain [mailto:rhc.open...@gmail.com] On Behalf Of Ralph
Castain
Sent: 07 October 2009 11:58
To: Blesson Varghese
Subject: Re: [OMPI users] MPI_Comm_accept()/connect() errors



Hate to tell you this, but your output clearly indicates you are NOT running
1.3.3 - that is an output from a 1.2.x version of OMPI.



Check you path and ld_library_path - you're still picking up the 1.2.5
version somewhere.





On Oct 7, 2009, at 4:05 AM, Blesson Varghese wrote:





Hi,



Please refer to the emails below.



I have made an upgrade to Open MPI 1.3.3 as suggested. The necessary
environment variables have all been set. Attaching the output of ompi_info
-all. However, the errors continue to persist.



[hpcc00:31864] [0,0,0] ORTE_ERROR_LOG: Not found in file dss/dss_unpack.c at
line 209

[hpcc00:31864] [0,0,0] ORTE_ERROR_LOG: Not found in file
communicator/comm_dyn.c at line 186

[hpcc00:31864] *** An error occurred in MPI_Comm_connect

[hpcc00:31864] *** on communicator MPI_COMM_WORLD

[hpcc00:31864] *** MPI_ERR_INTERN: internal error

[hpcc00:31864] *** MPI_ERRORS_ARE_FATAL (goodbye)





The server program is as follows:



#include 

#include 

#include 



int main( int argc, char **argv )

{

  MPI_Comm client;

  MPI_Status status;

  char port_name[MPI_MAX_PORT_NAME];

  int buf;

  int size, again;

  MPI_Info portInfo;



  MPI_Init( ,  );



  MPI_Comm_size(MPI_COMM_WORLD, );



  MPI_Open_port(MPI_INFO_NULL, port_name);



  printf("server available at %s\n",port_name);



  MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
);

  MPI_Recv(, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, client,
 );

  MPI_Comm_disconnect(  );

}



The client program is as follows:



#include 

#include 

#include 

#include 



int main( int argc, char **argv )

{

MPI_Comm server;

int buf = 8;

char port_name[MPI_MAX_PORT_NAME];

MPI_Info portInfo;



MPI_Init( ,  );



strcpy(port_name, "0.0.0:2000"); //The port name is hardcoded since
0.0.0:2000 is generated by the server program

MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  );



MPI_Send(, 1, MPI_INT, 0, 1, server );



MPI_Comm_disconnect(  );

MPI_Finalize();

return 0;

}



Would you please advise?



Regards,

Blesson.





-Original Message-
From: Blesson Varghese [mailto:hx019...@reading.ac.uk] 
Sent: 03 October 2009 12:20
To: 'Jeff Squyres'
Subject: RE: [OMPI users] MPI_Comm_accept()/connect() errors



Thank you. I shall try the upgrade very soon.



-Original Message-

From: Jeff Squyres [mailto:jsquy...@cisco.com]

Sent: 03 October 2009 12:18

To: Blesson Varghese

Subject: Re: [OMPI users] MPI_Comm_accept()/connect() errors



On Oct 3, 2009, at 7:14 AM, Blesson Varghese wrote:



> Thanks for your reply Jeff. Since, it is a teaching cluster of the

> University,  I am quite unsure if I would be able to upgrade it very 

> soon.

> 

> Do you reckon that the error is due to the Open MPI version?

> 



You can always install your own version of Open MPI under your $HOME 

or somesuch -- there is no requirement that Open MPI is installed by 

root in a central location.



That being said, you might want to check with your administrator to 

ensure that this is ok with local policies -- see if they did any 

special setup for Open MPI, etc.



But yes, we made a bunch of COMM_SPAWN improvements since the 1.2 

series.



--

Jeff Squyres

jsquy...@cisco.com





From: Blesson Varghese [mailto:hx019...@reading.ac.uk] 
Sent: 01 October 2009 12:01
To: 'Open MPI Users'; 'Ralph Castain'
Subject: RE: [OMPI users] MPI_Comm_accept()/connect() errors



The following is the information regarding the error. I am running Open MPI
1.2.5 on Ubuntu 4.2.4, kernel version 2.6.24



I ran the server program as mpirun -np 1 server. This program gave me the
output port as 0.1.0:2000. I used this port name value as the command line
argument for the client program: mpirun -np 1 client 0.1.1:2000.



- The output of the "ompi_info --all&q

Re: [OMPI users] MPI_Comm_accept()/connect() errors

2009-10-03 Thread Jeff Squyres

On Oct 1, 2009, at 7:00 AM, Blesson Varghese wrote:

The following is the information regarding the error. I am running  
Open MPI 1.2.5 on Ubuntu 4.2.4, kernel version 2.6.24


Is there any chance that you can upgrade to the Open MPI v1.3 series?

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] MPI_Comm_accept()/connect() errors

2009-10-01 Thread Blesson Varghese
The following is the information regarding the error. I am running Open MPI
1.2.5 on Ubuntu 4.2.4, kernel version 2.6.24



I ran the server program as mpirun -np 1 server. This program gave me the
output port as 0.1.0:2000. I used this port name value as the command line
argument for the client program: mpirun -np 1 client 0.1.1:2000.



- The output of the "ompi_info --all" is attached with the email

- PATH Variable:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr
/local/maui/bin/:

- LD_LIBRARY_PATH variable was empty

- The following is the output of ifconfig on hpcc00 from where the error has
been generated:

eth0  Link encap:Ethernet  HWaddr 00:12:3f:4c:2d:78

  inet addr:134.225.200.100  Bcast:134.225.200.255
Mask:255.255.255.0

  inet6 addr: fe80::212:3fff:fe4c:2d78/64 Scope:Link

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

  RX packets:15912728 errors:0 dropped:0 overruns:0 frame:0

  TX packets:15312376 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:1000

  RX bytes:2951880321 (2.7 GB)  TX bytes:2788249498 (2.5 GB)

  Interrupt:16



loLink encap:Local Loopback

  inet addr:127.0.0.1  Mask:255.0.0.0

  inet6 addr: ::1/128 Scope:Host

  UP LOOPBACK RUNNING  MTU:16436  Metric:1

  RX packets:3507489 errors:0 dropped:0 overruns:0 frame:0

  TX packets:3507489 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:0

  RX bytes:1794266658 (1.6 GB)  TX bytes:1794266658 (1.6 GB)



Regards,

Blesson.



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Ralph Castain
Sent: 29 September 2009 23:59
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept()/connect() errors



I will ask the obvious - what version of Open MPI are you running? In what
environment? What was your command line?



:-)



On Sep 29, 2009, at 3:50 PM, Blesson Varghese wrote:



Hi,



I have been trying to execute the server.c and client.c program provided in
http://www.mpi-forum.org/docs/mpi21-report/node213.htm#Node213, using
accept() and connect() function in MPI. However, the following errors are
generated.



[hpcc00:16522] *** An error occurred in MPI_Comm_connect

[hpcc00:16522] *** on communicator MPI_COMM_WORLD

[hpcc00:16522] *** MPI_ERR_INTERN: internal error

[hpcc00:16522] *** MPI_ERRORS_ARE_FATAL (goodbye)



Could anybody please help me?



Many thanks,
Blesson.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Open MPI: 1.2.5
   Open MPI SVN revision: r16989
Open RTE: 1.2.5
   Open RTE SVN revision: r16989
OPAL: 1.2.5
   OPAL SVN revision: r16989
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5)
 MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
 MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
   MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
  MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
  MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
  MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
  MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
 MCA gpr: proxy (MCA v1.0, API v1.0, Compon

Re: [OMPI users] MPI_Comm_accept()/connect() errors

2009-09-30 Thread Blesson Varghese
Thanks for your reply. 



I am running MPI 2.0 on Ubuntu 4.2.4, kernel version 2.6.24.



I ran the server program as mpirun -np 1 server. This program gave me the
output port as 0.1.0:2000. I used this port name value as the command line
argument for the client program: mpirun -np 1 client 0.1.1:2000



Regards,

Blesson. 



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Ralph Castain
Sent: 29 September 2009 23:59
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept()/connect() errors



I will ask the obvious - what version of Open MPI are you running? In what
environment? What was your command line?



:-)



On Sep 29, 2009, at 3:50 PM, Blesson Varghese wrote:





Hi,



I have been trying to execute the server.c and client.c program provided in
http://www.mpi-forum.org/docs/mpi21-report/node213.htm#Node213, using
accept() and connect() function in MPI. However, the following errors are
generated.



[hpcc00:16522] *** An error occurred in MPI_Comm_connect

[hpcc00:16522] *** on communicator MPI_COMM_WORLD

[hpcc00:16522] *** MPI_ERR_INTERN: internal error

[hpcc00:16522] *** MPI_ERRORS_ARE_FATAL (goodbye)



Could anybody please help me?



Many thanks,
Blesson.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] MPI_Comm_accept()/connect() errors

2009-09-29 Thread Ralph Castain
I will ask the obvious - what version of Open MPI are you running? In  
what environment? What was your command line?


:-)

On Sep 29, 2009, at 3:50 PM, Blesson Varghese wrote:


Hi,

I have been trying to execute the server.c and client.c program  
provided in http://www.mpi-forum.org/docs/mpi21-report/node213.htm#Node213 
, using accept() and connect() function in MPI. However, the  
following errors are generated.


[hpcc00:16522] *** An error occurred in MPI_Comm_connect
[hpcc00:16522] *** on communicator MPI_COMM_WORLD
[hpcc00:16522] *** MPI_ERR_INTERN: internal error
[hpcc00:16522] *** MPI_ERRORS_ARE_FATAL (goodbye)

Could anybody please help me?

Many thanks,
Blesson.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] MPI_Comm_accept()/connect() errors

2009-09-29 Thread Blesson Varghese
Hi,



I have been trying to execute the server.c and client.c program provided in
http://www.mpi-forum.org/docs/mpi21-report/node213.htm#Node213, using
accept() and connect() function in MPI. However, the following errors are
generated.



[hpcc00:16522] *** An error occurred in MPI_Comm_connect

[hpcc00:16522] *** on communicator MPI_COMM_WORLD

[hpcc00:16522] *** MPI_ERR_INTERN: internal error

[hpcc00:16522] *** MPI_ERRORS_ARE_FATAL (goodbye)



Could anybody please help me?



Many thanks,
Blesson.



Re: [OMPI users] MPI_Comm_Accept / MPI::Comm::Accept problem.

2007-04-28 Thread Nuno Sucena Almeida
Hi Jeff,

thanks for taking the time to answer this. I actually reached that 
conclusion after trying a simple MPI::Barrier() with both OpenMPI and
Lam-MPI , where both had the same active wait kind of behaviour.
What I'm trying to achive is to have some kind of calculation
server, where the clients can connect through MPI::Intercomm to the
server process with rank 0, and transfer data so that it can perform
computation, but it seems wasteful to have a server group of processes
running at 100% while waiting for the clients.
 It would be nice to be able to specify the behaviour in this
case, or do you suggest another approach?

Cheers, 

Nuno

On Fri, Apr 27, 2007 at 07:49:04PM -0400, Jeff Squyres wrote:
| This is actually expected behavior.  We make the assumption that MPI  
| processes are meant to exhibit as low latency as possible, and  
| therefore use active polling for most message passing.


Re: [OMPI users] MPI_Comm_Accept / MPI::Comm::Accept problem.

2007-04-28 Thread Jeff Squyres
This is actually expected behavior.  We make the assumption that MPI  
processes are meant to exhibit as low latency as possible, and  
therefore use active polling for most message passing.


Additionally, it may be possible that connections could come across  
multiple devices, so we need to poll them all to check for progress/ 
connections.  We've talked internally about getting better at  
recognizing single-device scenarios (and therefore allowing  
blocking), but haven't really done much about it.  Our internal  
interfaces were designed to be non-blocking for polling for maximum  
performance (i.e., lowest latency / highest bandwidth).



On Apr 26, 2007, at 3:48 PM, Nuno Sucena Almeida wrote:


Hello,

I'm having a weird problem while using the MPI_Comm_Accept (C) or the
MPI::Comm::Accept (C++ bindings).
	My "server" runs until the call to this function but if there's no  
client
connecting, it sits there eating all CPU (100%), although if a  
client connects
the loop works fine, but when the client disconnects again we are  
back to the

same high CPU usage.
	I tried using OpenMPI version 1.1.2 and 1.2. The machines  
architectures are
AMD Opteron and Intel Itanium2 respectively, the former compiled  
with gcc

4.1.1 and the later with gcc 3.2.3.

The C++ code is here:

http://compel.bu.edu/~nuno/openmpi/

along with the logs for orted and the 'server' output.

I started orted with:

orted --persistent --seed --scope public  --universe foo

and the 'server' with

mpirun --universe foo -np 1 ./server

	The code is a C++ conversion from the C basic one posted at the  
mpi-forum

website:

http://www.mpi-forum.org/docs/mpi-20-html/node106.htm#Node109

	Is there an easy fix for this? I tried also the C version having  
the same

problem...

Regards,

Nuno
--
http://aeminium.org/slug/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems