Sorry for the delay - got tied up. This looks sane to me and should work. FWIW: I haven't seen any problem with comm_spawn_multiple. That code will only executes if the specific limit is hit, so I suspect that is an issue of scale and race conditions.
On Jul 12, 2011, at 6:44 PM, Eugene Loh wrote: > Thanks for the quick and helpful response. > > I'm willing to make the changes if only I were more confident I understood > the issues. > > What you call the first usage of num_procs_alive seems to count "local procs > alive or about to be launched". It tests on "child->alive || > opal_dss.compare(&job, &(child->name->jobid), ORTE_JOBID))". This is only in > the !oversubscribed case. > > What you call "the later usage" ("0 < opal_sys_limits.num_procs" or "check > the system limits") seems to want only the number of alive procs (not > including any that are about to be launched). So, as I guess you were > driving at, a fresh computation of num_procs_alive seems to be needed at this > point, both in v1.5 and in the trunk. > > There is actually a third usage of num_procs_alive ("available file > descriptors" or "0 < opal_sys_limits.num_files"). I think there are two > problems here. One is that the number of procs we're about to launch is not > accounted for. Secondly, and perhaps most importantly, if we "wait a little > time" and recompute num_procs_alive, we loop over item/child. This is bad > since we are already inside an item/child loop! (This is actually the origin > of my foray into this mess... MPI_Comm_spawn_multiple gets screwed up by this > reuse of variable names.) > > I'm attaching an original (trunk) file, a proposed version, and the diff > file. Let me know if it looks sane. If so, I'll try it out and put it back > and then file a CMR for 1.5. > > Some more below... > > On 07/12/11 18:19, Ralph Castain wrote: > >> Well, yes and no. >> >> Yes - the value definitely needs to be computed in all cases since it is >> used later on. >> >> No - the way it is computed is only correct for that one usage. The later >> usage (in the block that starts with 0< opal_sys_limits.num_procs)) needs a >> completely different value. >> >> The current computation only comes up with the number of procs for the >> specific job being launched. The latter code requires the total number of >> local procs that are alive - a rather different value. > > The first usage, to compare with num_processors, includes not only procs for > the job being launched but also already-alive procs. > > The second usage, to be compared to opal_sys_limits.num_procs, should only > have alive procs since app->num_procs will be added later. (Plus, we try to > keep a running total of this number of alive procs.) > > The third usage, to be compared to opal_sys_limits.num_files, should also > only have alive procs for the same reasons. > >> So v1.5 is incorrect in either case. Would you like me to create a patch to >> correct it, or would you like to do it (covering both cases)? > > Trunk also appears to be incorrect. > >> On Jul 12, 2011, at 3:35 PM, Eugene Loh wrote: >>> The function orte_odls_base_default_launch_local() has a variable >>> num_procs_alive that is basically initialized like this: >>> >>> if ( oversubscribed ) { >>> ... >>> } else { >>> num_procs_alive = ...; >>> } >>> >>> Specifically, if the "oversubscribed" test passes, the variable is not >>> initialized. >>> >>> (Strictly speaking, this is true only in v1.5. In the trunk, the variable >>> is set to 0 when it is declared, but I'm not sure that's very helpful.) >>> >>> I'm inclined to move the num_procs_alive computation ahead of the "if" >>> block so that this computation is always performed. > <files.tar.gz>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel