Guess I was unclear, George - I don't know enough about Aurelien's app to know if it is capable of (or trying to) run as one job, or not.
What has been described on this thread to-date is, in fact, a corner case. Hence the proposal of another way to possibly address a corner case without disrupting the normal code operation. May not be possible, per the other more general thread.... On 7/27/07 8:31 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote: > It's not about the app. It's about the MPI standard. With one mpirun > you start one MPI application (SPMD or MPMD but still only one). The > first impact of this, is all processes started with one mpirun > command will belong to the same MPI_COMM_WORLD. > > Our mpirun is in fact equivalent to the mpiexec as defined in the MPI > standard. Therefore, we cannot change it's behavior, outside the MPI > 2 standard boundaries. > > Moreover, both of the approaches you described will only add corner > cases, which I rather prefer to limit in number. > > george. > > > On Jul 27, 2007, at 8:42 AM, Ralph Castain wrote: > >> >> >> >> On 7/26/07 4:22 PM, "Aurelien Bouteiller" <boute...@cs.utk.edu> wrote: >> >>>> mpirun -hostfile big_pool -n 10 -host 1,2,3,4 application : -n 2 - >>>> host >>>> 99,100 ft_server >>> >>> This will not work: this is a way to launch MIMD jobs, that share the >>> same COMM_WORLD. Not the way to launch two different applications >>> that >>> interact trough Accept/Connect. >>> >>> Direct consequence on simple NAS benchmarks are: >>> * if the second command does not use MPI-Init, then the first >>> application locks forever in MPI-Init >>> * if both use MPI init, the MPI_Comm_size of the jobs are incorrect. >>> >>> >>> **** >>> bouteill@dancer:~$ ompi-build/debug/bin/mpirun -prefix >>> /home/bouteill/ompi-build/debug/ -np 4 -host >>> node01,node02,node03,node04 >>> NPB3.2-MPI/bin/lu.A.4 : -np 1 -host node01 NPB3.2-MPI/bin/mg.A.1 >>> >>> >>> NAS Parallel Benchmarks 3.2 -- LU Benchmark >>> >>> Warning: program is running on 5 processors >>> but was compiled for 4 >>> Size: 64x 64x 64 >>> Iterations: 250 >>> Number of processes: 5 >> >> Okay - of course, I can't possibly have any idea how your application >> works... ;-) >> >> However, it would be trivial to simply add two options to the >> app_context >> command line: >> >> 1. designates that this app_context is to be launched as a separate >> job >> >> 2. indicates that this app_context is to be "connected" ala connect/ >> accept >> to the other app_contexts (if you want, we could even take an argument >> indicating which app_contexts it is to be connected to). Or we >> could reverse >> this as indicate we want it to be disconnected - all depends upon what >> default people want to define. >> >> This would solve the problem you describe while still allowing us >> to avoid >> allocation confusion. I'll send it out separately as an RFC. >> >> Thanks >> Ralph >> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel