Take a gander at ompi/tools/ompi-server - I believe I put a man page in there. You might just try "man ompi-server" and see if it shows up.
Holler if you have a question - not sure I documented it very thoroughly at the time. On 4/3/08 3:10 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote: > Ralph, > > > I am using trunk. Is there a documentation for ompi-server ? Sounds > exactly like what I need to fix point 1. > > Aurelien > > Le 3 avr. 08 à 17:06, Ralph Castain a écrit : >> I guess I'll have to ask the basic question: what version are you >> using? >> >> If you are talking about the trunk, there no longer is a "universe" >> concept >> anywhere in the code. Two mpiruns can connect/accept to each other >> as long >> as they can make contact. To facilitate that, we created an "ompi- >> server" >> tool that is supposed to be run by the sys-admin (or a user, doesn't >> matter >> which) on the head node - there are various ways to tell mpirun how to >> contact the server, or it can self-discover it. >> >> I have tested publish/lookup pretty thoroughly and it seems to work. I >> haven't spent much time testing connect/accept except via >> comm_spawn, which >> seems to be working. Since that uses the same mechanism, I would have >> expected connect/accept to work as well. >> >> If you are talking about 1.2.x, then the story is totally different. >> >> Ralph >> >> >> >> On 4/3/08 2:29 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> >> wrote: >> >>> Hi everyone, >>> >>> I'm trying to figure out how complete is the implementation of >>> Comm_connect/Accept. I found two problematic cases. >>> >>> 1) Two different programs are started in two different mpirun. One >>> makes accept, the second one use connect. I would not expect >>> MPI_Publish_name/Lookup_name to work because they do not share the >>> HNP. Still I would expect to be able to connect by copying (with >>> printf-scanf) the port_name string generated by Open_port; especially >>> considering that in Open MPI, the port_name is a string containing >>> the >>> tcp address and port of the rank 0 in the server communicator. >>> However, doing so results in "no route to host" and the connecting >>> application aborts. Is the problem related to an explicit check of >>> the >>> universes on the accept HNP ? Do I expect too much from the MPI >>> standard ? Is it because my two applications does not share the same >>> universe ? Should we (re) add the ability to use the same universe >>> for >>> several mpirun ? >>> >>> 2) Second issue is when the program setup a port, and then accept >>> multiple clients on this port. Everything works fine for the first >>> client, and then accept stalls forever when waiting for the second >>> one. My understanding of the standard is that it should work: 5.4.2 >>> states "it must call MPI_Open_port to establish a port [...] it must >>> call MPI_Comm_accept to accept connections from clients". I >>> understand >>> that for one MPI_Open_port I should be able to manage several MPI >>> clients. Am I understanding correctly the standard here and should we >>> fix this ? >>> >>> Here is a copy of the non-working code for reference. >>> >>> /* >>> * Copyright (c) 2004-2007 The Trustees of the University of >>> Tennessee. >>> * All rights reserved. >>> * $COPYRIGHT$ >>> * >>> * Additional copyrights may follow >>> * >>> * $HEADER$ >>> */ >>> #include <stdlib.h> >>> #include <stdio.h> >>> #include <mpi.h> >>> >>> int main(int argc, char *argv[]) >>> { >>> char port[MPI_MAX_PORT_NAME]; >>> int rank; >>> int np; >>> >>> >>> MPI_Init(&argc, &argv); >>> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> MPI_Comm_size(MPI_COMM_WORLD, &np); >>> >>> if(rank) >>> { >>> MPI_Comm comm; >>> /* client */ >>> MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0, >>> MPI_COMM_WORLD, MPI_STATUS_IGNORE); >>> printf("Read port: %s\n", port); >>> MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, >>> &comm); >>> >>> MPI_Send(&rank, 1, MPI_INT, 0, 1, comm); >>> MPI_Comm_disconnect(&comm); >>> } >>> else >>> { >>> int nc = np - 1; >>> MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc, >>> sizeof(MPI_Comm)); >>> MPI_Request *reqs = (MPI_Request *) calloc(nc, >>> sizeof(MPI_Request)); >>> int *event = (int *) calloc(nc, sizeof(int)); >>> int i; >>> >>> MPI_Open_port(MPI_INFO_NULL, port); >>> /* MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/ >>> printf("Port name: %s\n", port); >>> for(i = 1; i < np; i++) >>> MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0, >>> MPI_COMM_WORLD); >>> >>> for(i = 0; i < nc; i++) >>> { >>> MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, >>> &comm_nodes[i]); >>> printf("Accept %d\n", i); >>> MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i], >>> &reqs[i]); >>> printf("IRecv %d\n", i); >>> } >>> MPI_Close_port(port); >>> MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE); >>> for(i = 0; i < nc; i++) >>> { >>> printf("event[%d] = %d\n", i, event[i]); >>> MPI_Comm_disconnect(&comm_nodes[i]); >>> printf("Disconnect %d\n", i); >>> } >>> } >>> >>> MPI_Finalize(); >>> return EXIT_SUCCESS; >>> } >>> >>> >>> >>> >>> -- >>> * Dr. Aurélien Bouteiller >>> * Sr. Research Associate at Innovative Computing Laboratory >>> * University of Tennessee >>> * 1122 Volunteer Boulevard, suite 350 >>> * Knoxville, TN 37996 >>> * 865 974 6321 >>> >>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel