Take a gander at ompi/tools/ompi-server - I believe I put a man page in
there. You might just try "man ompi-server" and see if it shows up.

Holler if you have a question - not sure I documented it very thoroughly at
the time.


On 4/3/08 3:10 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:

> Ralph,
> 
> 
> I am using trunk. Is there a documentation for ompi-server ? Sounds
> exactly like what I need to fix point 1.
> 
> Aurelien
> 
> Le 3 avr. 08 à 17:06, Ralph Castain a écrit :
>> I guess I'll have to ask the basic question: what version are you
>> using?
>> 
>> If you are talking about the trunk, there no longer is a "universe"
>> concept
>> anywhere in the code. Two mpiruns can connect/accept to each other
>> as long
>> as they can make contact. To facilitate that, we created an "ompi-
>> server"
>> tool that is supposed to be run by the sys-admin (or a user, doesn't
>> matter
>> which) on the head node - there are various ways to tell mpirun how to
>> contact the server, or it can self-discover it.
>> 
>> I have tested publish/lookup pretty thoroughly and it seems to work. I
>> haven't spent much time testing connect/accept except via
>> comm_spawn, which
>> seems to be working. Since that uses the same mechanism, I would have
>> expected connect/accept to work as well.
>> 
>> If you are talking about 1.2.x, then the story is totally different.
>> 
>> Ralph
>> 
>> 
>> 
>> On 4/3/08 2:29 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu>
>> wrote:
>> 
>>> Hi everyone,
>>> 
>>> I'm trying to figure out how complete is the implementation of
>>> Comm_connect/Accept. I found two problematic cases.
>>> 
>>> 1) Two different programs are started in two different mpirun. One
>>> makes accept, the second one use connect. I would not expect
>>> MPI_Publish_name/Lookup_name to work because they do not share the
>>> HNP. Still I would expect to be able to connect by copying (with
>>> printf-scanf) the port_name string generated by Open_port; especially
>>> considering that in Open MPI, the port_name is a string containing
>>> the
>>> tcp address and port of the rank 0 in the server communicator.
>>> However, doing so results in "no route to host" and the connecting
>>> application aborts. Is the problem related to an explicit check of
>>> the
>>> universes on the accept HNP ? Do I expect too much from the MPI
>>> standard ? Is it because my two applications does not share the same
>>> universe ? Should we (re) add the ability to use the same universe
>>> for
>>> several mpirun ?
>>> 
>>> 2) Second issue is when the program setup a port, and then accept
>>> multiple clients on this port. Everything works fine for the first
>>> client, and then accept stalls forever when waiting for the second
>>> one. My understanding of the standard is that it should work: 5.4.2
>>> states "it must call MPI_Open_port to establish a port [...] it must
>>> call MPI_Comm_accept to accept connections from clients". I
>>> understand
>>> that for one MPI_Open_port I should be able to manage several MPI
>>> clients. Am I understanding correctly the standard here and should we
>>> fix this ?
>>> 
>>> Here is a copy of the non-working code for reference.
>>> 
>>> /*
>>>  * Copyright (c) 2004-2007 The Trustees of the University of
>>> Tennessee.
>>>  *                         All rights reserved.
>>>  * $COPYRIGHT$
>>>  *
>>>  * Additional copyrights may follow
>>>  *
>>>  * $HEADER$
>>>  */
>>> #include <stdlib.h>
>>> #include <stdio.h>
>>> #include <mpi.h>
>>> 
>>> int main(int argc, char *argv[])
>>> {
>>>     char port[MPI_MAX_PORT_NAME];
>>>     int rank;
>>>     int np;
>>> 
>>> 
>>>     MPI_Init(&argc, &argv);
>>>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>     MPI_Comm_size(MPI_COMM_WORLD, &np);
>>> 
>>>     if(rank)
>>>     {
>>>         MPI_Comm comm;
>>>         /* client */
>>>         MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
>>> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>>>         printf("Read port: %s\n", port);
>>>         MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
>>> &comm);
>>> 
>>>         MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
>>>         MPI_Comm_disconnect(&comm);
>>>     }
>>>     else
>>>     {
>>>         int nc = np - 1;
>>>         MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
>>> sizeof(MPI_Comm));
>>>         MPI_Request *reqs = (MPI_Request *) calloc(nc,
>>> sizeof(MPI_Request));
>>>         int *event = (int *) calloc(nc, sizeof(int));
>>>         int i;
>>> 
>>>         MPI_Open_port(MPI_INFO_NULL, port);
>>> /*        MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
>>>         printf("Port name: %s\n", port);
>>>         for(i = 1; i < np; i++)
>>>             MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,
>>> MPI_COMM_WORLD);
>>> 
>>>         for(i = 0; i < nc; i++)
>>>         {
>>>             MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
>>> &comm_nodes[i]);
>>>             printf("Accept %d\n", i);
>>>             MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
>>> &reqs[i]);
>>>             printf("IRecv %d\n", i);
>>>         }
>>>         MPI_Close_port(port);
>>>         MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
>>>         for(i = 0; i < nc; i++)
>>>         {
>>>             printf("event[%d] = %d\n", i, event[i]);
>>>             MPI_Comm_disconnect(&comm_nodes[i]);
>>>             printf("Disconnect %d\n", i);
>>>         }
>>>     }
>>> 
>>>     MPI_Finalize();
>>>     return EXIT_SUCCESS;
>>> }
>>> 
>>> 
>>> 
>>> 
>>> --
>>> * Dr. Aurélien Bouteiller
>>> * Sr. Research Associate at Innovative Computing Laboratory
>>> * University of Tennessee
>>> * 1122 Volunteer Boulevard, suite 350
>>> * Knoxville, TN 37996
>>> * 865 974 6321
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Reply via email to