Yeah, you didn't specify the file correctly...plus I found a bug in the code when I looked (out-of-date a little in orterun).
I am updating orterun (commit soon) and will include a better help message about the proper format of the orterun cmd-line option. The syntax is: -ompi-server uri or -ompi-server file:filename-where-uri-exists Problem here is that you gave it a uri of "test", which means nothing. ;-) Should have it up-and-going soon. Ralph On 4/4/08 12:02 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote: > Ralph, > > I've not been very successful at using ompi-server. I tried this : > > xterm1$ ompi-server --debug-devel -d --report-uri test > [grosse-pomme.local:01097] proc_info: hnp_uri NULL > daemon uri NULL > [grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running! > > > xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test > Port name: > 2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300 > > xterm3$ mpirun -ompi-server test -np 1 simple_connect > -------------------------------------------------------------------------- > Process rank 0 attempted to lookup from a global ompi_server that > could not be contacted. This is typically caused by either not > specifying the contact info for the server, or by the server not > currently executing. If you did specify the contact info for a > server, please check to see that the server is running and start > it again (or have your sys admin start it) if it isn't. > > -------------------------------------------------------------------------- > [grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name > [grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD > [grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument > [grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye) > -------------------------------------------------------------------------- > > > > The server code Open_port, and then PublishName. Looks like the > LookupName function cannot reach the ompi-server. The ompi-server in > debug mode does not show any output when a new event occurs (like when > the server is launched). Is there something wrong in the way I use it ? > > Aurelien > > Le 3 avr. 08 à 17:21, Ralph Castain a écrit : >> Take a gander at ompi/tools/ompi-server - I believe I put a man page >> in >> there. You might just try "man ompi-server" and see if it shows up. >> >> Holler if you have a question - not sure I documented it very >> thoroughly at >> the time. >> >> >> On 4/3/08 3:10 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> >> wrote: >> >>> Ralph, >>> >>> >>> I am using trunk. Is there a documentation for ompi-server ? Sounds >>> exactly like what I need to fix point 1. >>> >>> Aurelien >>> >>> Le 3 avr. 08 à 17:06, Ralph Castain a écrit : >>>> I guess I'll have to ask the basic question: what version are you >>>> using? >>>> >>>> If you are talking about the trunk, there no longer is a "universe" >>>> concept >>>> anywhere in the code. Two mpiruns can connect/accept to each other >>>> as long >>>> as they can make contact. To facilitate that, we created an "ompi- >>>> server" >>>> tool that is supposed to be run by the sys-admin (or a user, doesn't >>>> matter >>>> which) on the head node - there are various ways to tell mpirun >>>> how to >>>> contact the server, or it can self-discover it. >>>> >>>> I have tested publish/lookup pretty thoroughly and it seems to >>>> work. I >>>> haven't spent much time testing connect/accept except via >>>> comm_spawn, which >>>> seems to be working. Since that uses the same mechanism, I would >>>> have >>>> expected connect/accept to work as well. >>>> >>>> If you are talking about 1.2.x, then the story is totally different. >>>> >>>> Ralph >>>> >>>> >>>> >>>> On 4/3/08 2:29 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> >>>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I'm trying to figure out how complete is the implementation of >>>>> Comm_connect/Accept. I found two problematic cases. >>>>> >>>>> 1) Two different programs are started in two different mpirun. One >>>>> makes accept, the second one use connect. I would not expect >>>>> MPI_Publish_name/Lookup_name to work because they do not share the >>>>> HNP. Still I would expect to be able to connect by copying (with >>>>> printf-scanf) the port_name string generated by Open_port; >>>>> especially >>>>> considering that in Open MPI, the port_name is a string containing >>>>> the >>>>> tcp address and port of the rank 0 in the server communicator. >>>>> However, doing so results in "no route to host" and the connecting >>>>> application aborts. Is the problem related to an explicit check of >>>>> the >>>>> universes on the accept HNP ? Do I expect too much from the MPI >>>>> standard ? Is it because my two applications does not share the >>>>> same >>>>> universe ? Should we (re) add the ability to use the same universe >>>>> for >>>>> several mpirun ? >>>>> >>>>> 2) Second issue is when the program setup a port, and then accept >>>>> multiple clients on this port. Everything works fine for the first >>>>> client, and then accept stalls forever when waiting for the second >>>>> one. My understanding of the standard is that it should work: 5.4.2 >>>>> states "it must call MPI_Open_port to establish a port [...] it >>>>> must >>>>> call MPI_Comm_accept to accept connections from clients". I >>>>> understand >>>>> that for one MPI_Open_port I should be able to manage several MPI >>>>> clients. Am I understanding correctly the standard here and >>>>> should we >>>>> fix this ? >>>>> >>>>> Here is a copy of the non-working code for reference. >>>>> >>>>> /* >>>>> * Copyright (c) 2004-2007 The Trustees of the University of >>>>> Tennessee. >>>>> * All rights reserved. >>>>> * $COPYRIGHT$ >>>>> * >>>>> * Additional copyrights may follow >>>>> * >>>>> * $HEADER$ >>>>> */ >>>>> #include <stdlib.h> >>>>> #include <stdio.h> >>>>> #include <mpi.h> >>>>> >>>>> int main(int argc, char *argv[]) >>>>> { >>>>> char port[MPI_MAX_PORT_NAME]; >>>>> int rank; >>>>> int np; >>>>> >>>>> >>>>> MPI_Init(&argc, &argv); >>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>>>> MPI_Comm_size(MPI_COMM_WORLD, &np); >>>>> >>>>> if(rank) >>>>> { >>>>> MPI_Comm comm; >>>>> /* client */ >>>>> MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0, >>>>> MPI_COMM_WORLD, MPI_STATUS_IGNORE); >>>>> printf("Read port: %s\n", port); >>>>> MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, >>>>> &comm); >>>>> >>>>> MPI_Send(&rank, 1, MPI_INT, 0, 1, comm); >>>>> MPI_Comm_disconnect(&comm); >>>>> } >>>>> else >>>>> { >>>>> int nc = np - 1; >>>>> MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc, >>>>> sizeof(MPI_Comm)); >>>>> MPI_Request *reqs = (MPI_Request *) calloc(nc, >>>>> sizeof(MPI_Request)); >>>>> int *event = (int *) calloc(nc, sizeof(int)); >>>>> int i; >>>>> >>>>> MPI_Open_port(MPI_INFO_NULL, port); >>>>> /* MPI_Publish_name("test_service_el", MPI_INFO_NULL, >>>>> port);*/ >>>>> printf("Port name: %s\n", port); >>>>> for(i = 1; i < np; i++) >>>>> MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0, >>>>> MPI_COMM_WORLD); >>>>> >>>>> for(i = 0; i < nc; i++) >>>>> { >>>>> MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, >>>>> &comm_nodes[i]); >>>>> printf("Accept %d\n", i); >>>>> MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i], >>>>> &reqs[i]); >>>>> printf("IRecv %d\n", i); >>>>> } >>>>> MPI_Close_port(port); >>>>> MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE); >>>>> for(i = 0; i < nc; i++) >>>>> { >>>>> printf("event[%d] = %d\n", i, event[i]); >>>>> MPI_Comm_disconnect(&comm_nodes[i]); >>>>> printf("Disconnect %d\n", i); >>>>> } >>>>> } >>>>> >>>>> MPI_Finalize(); >>>>> return EXIT_SUCCESS; >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> * Dr. Aurélien Bouteiller >>>>> * Sr. Research Associate at Innovative Computing Laboratory >>>>> * University of Tennessee >>>>> * 1122 Volunteer Boulevard, suite 350 >>>>> * Knoxville, TN 37996 >>>>> * 865 974 6321 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel