I guess I'll have to ask the basic question: what version are you using?

If you are talking about the trunk, there no longer is a "universe" concept
anywhere in the code. Two mpiruns can connect/accept to each other as long
as they can make contact. To facilitate that, we created an "ompi-server"
tool that is supposed to be run by the sys-admin (or a user, doesn't matter
which) on the head node - there are various ways to tell mpirun how to
contact the server, or it can self-discover it.

I have tested publish/lookup pretty thoroughly and it seems to work. I
haven't spent much time testing connect/accept except via comm_spawn, which
seems to be working. Since that uses the same mechanism, I would have
expected connect/accept to work as well.

If you are talking about 1.2.x, then the story is totally different.

Ralph



On 4/3/08 2:29 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:

> Hi everyone,
> 
> I'm trying to figure out how complete is the implementation of
> Comm_connect/Accept. I found two problematic cases.
> 
> 1) Two different programs are started in two different mpirun. One
> makes accept, the second one use connect. I would not expect
> MPI_Publish_name/Lookup_name to work because they do not share the
> HNP. Still I would expect to be able to connect by copying (with
> printf-scanf) the port_name string generated by Open_port; especially
> considering that in Open MPI, the port_name is a string containing the
> tcp address and port of the rank 0 in the server communicator.
> However, doing so results in "no route to host" and the connecting
> application aborts. Is the problem related to an explicit check of the
> universes on the accept HNP ? Do I expect too much from the MPI
> standard ? Is it because my two applications does not share the same
> universe ? Should we (re) add the ability to use the same universe for
> several mpirun ?
> 
> 2) Second issue is when the program setup a port, and then accept
> multiple clients on this port. Everything works fine for the first
> client, and then accept stalls forever when waiting for the second
> one. My understanding of the standard is that it should work: 5.4.2
> states "it must call MPI_Open_port to establish a port [...] it must
> call MPI_Comm_accept to accept connections from clients". I understand
> that for one MPI_Open_port I should be able to manage several MPI
> clients. Am I understanding correctly the standard here and should we
> fix this ?
> 
> Here is a copy of the non-working code for reference.
> 
> /*
>   * Copyright (c) 2004-2007 The Trustees of the University of Tennessee.
>   *                         All rights reserved.
>   * $COPYRIGHT$
>   *
>   * Additional copyrights may follow
>   *
>   * $HEADER$
>   */
> #include <stdlib.h>
> #include <stdio.h>
> #include <mpi.h>
> 
> int main(int argc, char *argv[])
> {
>      char port[MPI_MAX_PORT_NAME];
>      int rank;
>      int np;
> 
> 
>      MPI_Init(&argc, &argv);
>      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>      MPI_Comm_size(MPI_COMM_WORLD, &np);
> 
>      if(rank)
>      {
>          MPI_Comm comm;
>          /* client */
>          MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>          printf("Read port: %s\n", port);
>          MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
> 
>          MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
>          MPI_Comm_disconnect(&comm);
>      }
>      else
>      {
>          int nc = np - 1;
>          MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
> sizeof(MPI_Comm));
>          MPI_Request *reqs = (MPI_Request *) calloc(nc,
> sizeof(MPI_Request));
>          int *event = (int *) calloc(nc, sizeof(int));
>          int i;
> 
>          MPI_Open_port(MPI_INFO_NULL, port);
> /*        MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
>          printf("Port name: %s\n", port);
>          for(i = 1; i < np; i++)
>              MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,
> MPI_COMM_WORLD);
> 
>          for(i = 0; i < nc; i++)
>          {
>              MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &comm_nodes[i]);
>              printf("Accept %d\n", i);
>              MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
> &reqs[i]);
>              printf("IRecv %d\n", i);
>          }
>          MPI_Close_port(port);
>          MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
>          for(i = 0; i < nc; i++)
>          {
>              printf("event[%d] = %d\n", i, event[i]);
>              MPI_Comm_disconnect(&comm_nodes[i]);
>              printf("Disconnect %d\n", i);
>          }
>      }
> 
>      MPI_Finalize();
>      return EXIT_SUCCESS;
> }
> 
> 
> 
> 
> --
> * Dr. Aurélien Bouteiller
> * Sr. Research Associate at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
> 
> 
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Reply via email to