Hi Ralph,

Thanks for the explanation.  Does ORTE/OMPI always assume that for multi-node 
jobs,
there will only be one user's job/node?    At my previous employer we were 
having
to do some changes to runtime components in order to support slurm, for which 
the customers'
default settings was to prefer filling of nodes with jobs even if that meant 
multi-node
jobs of different users were intermingled within nodes.  The customers did not 
want
to have to use the exclusive option.

Just a heads up if folks who are working on cray xe/xc systems are making 
assumptions
that the way things work now with aprun will hold true going forwards.

Howard


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, June 18, 2014 5:00 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] r31916 question

You know, looking at the code and the comments, the rationale for putting the 
nids in order was to prep the list for the regex generator. If you look in the 
plm_ras_module, you'll see that we pass the nodelist to 
orte_plm_base_orted_append_basic_args. ORNL used static ports for alps to get 
better scaling, and so that function creates a regular expression from the 
nodelist. We then pass that to each orted upon launch so it can compute the URI 
for all other orteds in the system, thus allowing it to connect back to mpirun 
thru the routing tree (instead of making a direct connection).

HTH
Ralph

On Jun 18, 2014, at 3:55 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:


Ah, I see - yes, you'd get_attribute to retrieve it. Alternatively, you have it 
sitting right there in an array, so you could just use the array to order the 
list


On Jun 18, 2014, at 3:47 PM, Pritchard, Howard P 
<howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote:


Hi Ralph,

It is setting the attribute, but then for some reason there seems to be a need 
to have the node ids (nids) in
ascending order, so there's some code looking at the old launch_id field, which 
no longer exists.

I'm fixing it.  I'd like to learn the cycle of getting fixes in to trunk.

Thanks,

Howard


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, June 18, 2014 4:45 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] r31916 question

Huh - thought I got that. Sorry I missed it. Let me take a look and ensure that 
the alps ras module is setting that attribute

On Jun 18, 2014, at 2:40 PM, Pritchard, Howard P 
<howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote:



Hello Folks,

I'm looking at commit 31916 and notice a lot of fields were remote from 
orte_node_t.
This is now preventing ras_alps_module.c from compiling owing to use of a 
"launch_id"
field.

In lieu of the direct use of launch_id, should I replace the code around 587 of 
this file with
use of orte_get_attribute with ORTE_NODE_LAUNCH_ID for the attribute to be 
retrieved?

Thanks,

Howard


-------------------------------------------------
Howard Pritchard
HPC-5
Los Alamos National Laboratory


_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/06/15008.php

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/06/15010.php


Reply via email to