Hi Ralph, Thanks for the explanation. Does ORTE/OMPI always assume that for multi-node jobs, there will only be one user's job/node? At my previous employer we were having to do some changes to runtime components in order to support slurm, for which the customers' default settings was to prefer filling of nodes with jobs even if that meant multi-node jobs of different users were intermingled within nodes. The customers did not want to have to use the exclusive option.
Just a heads up if folks who are working on cray xe/xc systems are making assumptions that the way things work now with aprun will hold true going forwards. Howard From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, June 18, 2014 5:00 PM To: Open MPI Developers Subject: Re: [OMPI devel] r31916 question You know, looking at the code and the comments, the rationale for putting the nids in order was to prep the list for the regex generator. If you look in the plm_ras_module, you'll see that we pass the nodelist to orte_plm_base_orted_append_basic_args. ORNL used static ports for alps to get better scaling, and so that function creates a regular expression from the nodelist. We then pass that to each orted upon launch so it can compute the URI for all other orteds in the system, thus allowing it to connect back to mpirun thru the routing tree (instead of making a direct connection). HTH Ralph On Jun 18, 2014, at 3:55 PM, Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote: Ah, I see - yes, you'd get_attribute to retrieve it. Alternatively, you have it sitting right there in an array, so you could just use the array to order the list On Jun 18, 2014, at 3:47 PM, Pritchard, Howard P <howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote: Hi Ralph, It is setting the attribute, but then for some reason there seems to be a need to have the node ids (nids) in ascending order, so there's some code looking at the old launch_id field, which no longer exists. I'm fixing it. I'd like to learn the cycle of getting fixes in to trunk. Thanks, Howard From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, June 18, 2014 4:45 PM To: Open MPI Developers Subject: Re: [OMPI devel] r31916 question Huh - thought I got that. Sorry I missed it. Let me take a look and ensure that the alps ras module is setting that attribute On Jun 18, 2014, at 2:40 PM, Pritchard, Howard P <howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote: Hello Folks, I'm looking at commit 31916 and notice a lot of fields were remote from orte_node_t. This is now preventing ras_alps_module.c from compiling owing to use of a "launch_id" field. In lieu of the direct use of launch_id, should I replace the code around 587 of this file with use of orte_get_attribute with ORTE_NODE_LAUNCH_ID for the attribute to be retrieved? Thanks, Howard ------------------------------------------------- Howard Pritchard HPC-5 Los Alamos National Laboratory _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/06/15008.php _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/06/15010.php