Sorry for the delay - got tied up. This looks sane to me and should work.

FWIW: I haven't seen any problem with comm_spawn_multiple. That code will only 
executes if the specific limit is hit, so I suspect that is an issue of scale 
and race conditions.


On Jul 12, 2011, at 6:44 PM, Eugene Loh wrote:

> Thanks for the quick and helpful response.
> 
> I'm willing to make the changes if only I were more confident I understood 
> the issues.
> 
> What you call the first usage of num_procs_alive seems to count "local procs 
> alive or about to be launched".  It tests on "child->alive || 
> opal_dss.compare(&job, &(child->name->jobid), ORTE_JOBID))".  This is only in 
> the !oversubscribed case.
> 
> What you call "the later usage" ("0 < opal_sys_limits.num_procs" or "check 
> the system limits") seems to want only the number of alive procs (not 
> including any that are about to be launched).  So, as I guess you were 
> driving at, a fresh computation of num_procs_alive seems to be needed at this 
> point, both in v1.5 and in the trunk.
> 
> There is actually a third usage of num_procs_alive ("available file 
> descriptors" or "0 < opal_sys_limits.num_files").  I think there are two 
> problems here.  One is that the number of procs we're about to launch is not 
> accounted for.  Secondly, and perhaps most importantly, if we "wait a little 
> time" and recompute num_procs_alive, we loop over item/child. This is bad 
> since we are already inside an item/child loop!  (This is actually the origin 
> of my foray into this mess... MPI_Comm_spawn_multiple gets screwed up by this 
> reuse of variable names.)
> 
> I'm attaching an original (trunk) file, a proposed version, and the diff 
> file.  Let me know if it looks sane.  If so, I'll try it out and put it back 
> and then file a CMR for 1.5.
> 
> Some more below...
> 
> On 07/12/11 18:19, Ralph Castain wrote:
> 
>> Well, yes and no.
>> 
>> Yes - the value definitely needs to be computed in all cases since it is 
>> used later on.
>> 
>> No - the way it is computed is only correct for that one usage. The later 
>> usage (in the block that starts with 0<  opal_sys_limits.num_procs)) needs a 
>> completely different value.
>> 
>> The current computation only comes up with the number of procs for the 
>> specific job being launched. The latter code requires the total number of 
>> local procs that are alive - a rather different value.
> 
> The first usage, to compare with num_processors, includes not only procs for 
> the job being launched but also already-alive procs.
> 
> The second usage, to be compared to opal_sys_limits.num_procs, should only 
> have alive procs since app->num_procs will be added later.  (Plus, we try to 
> keep a running total of this number of alive procs.)
> 
> The third usage, to be compared to opal_sys_limits.num_files, should also 
> only have alive procs for the same reasons.
> 
>> So v1.5 is incorrect in either case. Would you like me to create a patch to 
>> correct it, or would you like to do it (covering both cases)?
> 
> Trunk also appears to be incorrect.
> 
>> On Jul 12, 2011, at 3:35 PM, Eugene Loh wrote:
>>> The function orte_odls_base_default_launch_local() has a variable 
>>> num_procs_alive that is basically initialized like this:
>>> 
>>>    if ( oversubscribed ) {
>>>        ...
>>>    } else {
>>>        num_procs_alive = ...;
>>>    }
>>> 
>>> Specifically, if the "oversubscribed" test passes, the variable is not 
>>> initialized.
>>> 
>>> (Strictly speaking, this is true only in v1.5.  In the trunk, the variable 
>>> is set to 0 when it is declared, but I'm not sure that's very helpful.)
>>> 
>>> I'm inclined to move the num_procs_alive computation ahead of the "if" 
>>> block so that this computation is always performed.
> <files.tar.gz>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to