Guess I was unclear, George - I don't know enough about Aurelien's app to
know if it is capable of (or trying to) run as one job, or not.

What has been described on this thread to-date is, in fact, a corner case.
Hence the proposal of another way to possibly address a corner case without
disrupting the normal code operation.

May not be possible, per the other more general thread....


On 7/27/07 8:31 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote:

> It's not about the app. It's about the MPI standard. With one mpirun
> you start one MPI application (SPMD or MPMD but still only one). The
> first impact of this, is all processes started with one mpirun
> command will belong to the same MPI_COMM_WORLD.
> 
> Our mpirun is in fact equivalent to the mpiexec as defined in the MPI
> standard. Therefore, we cannot change it's behavior, outside the MPI
> 2 standard boundaries.
> 
> Moreover, both of the approaches you described will only add corner
> cases, which I rather prefer to limit in number.
> 
>    george.
> 
> 
> On Jul 27, 2007, at 8:42 AM, Ralph Castain wrote:
> 
>> 
>> 
>> 
>> On 7/26/07 4:22 PM, "Aurelien Bouteiller" <boute...@cs.utk.edu> wrote:
>> 
>>>> mpirun -hostfile big_pool -n 10 -host 1,2,3,4 application : -n 2 -
>>>> host
>>>> 99,100 ft_server
>>> 
>>> This will not work: this is a way to launch MIMD jobs, that share the
>>> same COMM_WORLD. Not the way to launch two different applications
>>> that
>>> interact trough Accept/Connect.
>>> 
>>> Direct consequence on simple NAS benchmarks are:
>>> * if the second command does not use MPI-Init, then the first
>>> application locks forever in MPI-Init
>>> * if both use MPI init, the MPI_Comm_size of the jobs are incorrect.
>>> 
>>> 
>>> ****
>>> bouteill@dancer:~$ ompi-build/debug/bin/mpirun -prefix
>>> /home/bouteill/ompi-build/debug/ -np 4 -host
>>> node01,node02,node03,node04
>>> NPB3.2-MPI/bin/lu.A.4 : -np 1 -host node01 NPB3.2-MPI/bin/mg.A.1
>>> 
>>> 
>>>  NAS Parallel Benchmarks 3.2 -- LU Benchmark
>>> 
>>>      Warning: program is running on  5 processors
>>>      but was compiled for   4
>>>  Size:  64x 64x 64
>>>  Iterations: 250
>>>  Number of processes:     5
>> 
>> Okay - of course, I can't possibly have any idea how your application
>> works... ;-)
>> 
>> However, it would be trivial to simply add two options to the
>> app_context
>> command line:
>> 
>> 1. designates that this app_context is to be launched as a separate
>> job
>> 
>> 2. indicates that this app_context is to be "connected" ala connect/
>> accept
>> to the other app_contexts (if you want, we could even take an argument
>> indicating which app_contexts it is to be connected to). Or we
>> could reverse
>> this as indicate we want it to be disconnected - all depends upon what
>> default people want to define.
>> 
>> This would solve the problem you describe while still allowing us
>> to avoid
>> allocation confusion. I'll send it out separately as an RFC.
>> 
>> Thanks
>> Ralph
>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to