Re: [OMPI devel] Hostfiles - yet again

George Bosilca Fri, 27 Jul 2007 11:31:11 -0400

You were limpid. What we're trying to say here, it's that thesolution you described few emails ago, doesn't work. At least itdoesn't work for what we want to do (i.e. what Aurelien described inhis first email). We [really] need 2 separate MPI worlds, that wewill connect at a later moment, and not one larger MPI world.

Allow me to reiterate on what we are looking for. We want to savesome information (related to fault tolerance but this might beignored here), on another MPI application. The user will start his/her MPI application in exactly the same way as before plus 2 new mcaarguments. One for enabling the message logging approach and one forthe connect/accept port info. Once our internal framework isinitialized in the user application, it will connect to the spare MPIapplication (let's call it storage application) (launched by the useron some specific nodes that have better capabilities as Aureliendescribed in his initial email). Now the user application and thestorage one will be able to communicate via MPI, and thereforegetting the best performance out of the available networks. Once theuser application successfully complete, the storage application candisappear (or not, we will take what's available in Open MPI at thattime).

This approach is not a corner case. It's a completely valid approachas described in the MPI-2 standard. However, as usual the MPIstandard is not very clear on how to manage the connectioninformation, so this is the big unknown here.


  george.

On Jul 27, 2007, at 11:08 AM, Ralph Castain wrote:

Guess I was unclear, George - I don't know enough about Aurelien'sapp to

know if it is capable of (or trying to) run as one job, or not.

What has been described on this thread to-date is, in fact, acorner case.Hence the proposal of another way to possibly address a corner casewithout

disrupting the normal code operation.

May not be possible, per the other more general thread....


On 7/27/07 8:31 AM, "George Bosilca" <[email protected]> wrote:

It's not about the app. It's about the MPI standard. With one mpirun
you start one MPI application (SPMD or MPMD but still only one). The
first impact of this, is all processes started with one mpirun
command will belong to the same MPI_COMM_WORLD.

Our mpirun is in fact equivalent to the mpiexec as defined in the MPI
standard. Therefore, we cannot change it's behavior, outside the MPI
2 standard boundaries.

Moreover, both of the approaches you described will only add corner
cases, which I rather prefer to limit in number.

   george.


On Jul 27, 2007, at 8:42 AM, Ralph Castain wrote:

On 7/26/07 4:22 PM, "Aurelien Bouteiller" <[email protected]>wrote:

mpirun -hostfile big_pool -n 10 -host 1,2,3,4 application : -n 2 -
host
99,100 ft_server

This will not work: this is a way to launch MIMD jobs, thatshare the

same COMM_WORLD. Not the way to launch two different applications
that
interact trough Accept/Connect.

Direct consequence on simple NAS benchmarks are:
* if the second command does not use MPI-Init, then the first
application locks forever in MPI-Init

* if both use MPI init, the MPI_Comm_size of the jobs areincorrect.



****
bouteill@dancer:~$ ompi-build/debug/bin/mpirun -prefix
/home/bouteill/ompi-build/debug/ -np 4 -host
node01,node02,node03,node04
NPB3.2-MPI/bin/lu.A.4 : -np 1 -host node01 NPB3.2-MPI/bin/mg.A.1


 NAS Parallel Benchmarks 3.2 -- LU Benchmark

     Warning: program is running on  5 processors
     but was compiled for   4
 Size:  64x 64x 64
 Iterations: 250
 Number of processes:     5

Okay - of course, I can't possibly have any idea how yourapplication

works... ;-)

However, it would be trivial to simply add two options to the
app_context
command line:

1. designates that this app_context is to be launched as a separate
job

2. indicates that this app_context is to be "connected" ala connect/
accept

to the other app_contexts (if you want, we could even take anargument

indicating which app_contexts it is to be connected to). Or we
could reverse

this as indicate we want it to be disconnected - all depends uponwhat

default people want to define.

This would solve the problem you describe while still allowing us
to avoid
allocation confusion. I'll send it out separately as an RFC.

Thanks
Ralph




_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Hostfiles - yet again

Reply via email to