Re: [OMPI devel] RTE node allocation component

Ralph Castain Sat, 14 Apr 2012 17:36:54 -0400

The 1.6 branch is a stable series - no new features will be added to it, so 
your patch won't be going there. I'd focus solely on the trunk.


What you're doing with he RAS is fine for now. In the next few days, I'll be 
changing the API to the RAS components, but it isn't a big change and we can 
adjust as you get closer. The orte_job_t object does contain the number of 
procs to be launched prior to the RAS being invoked, but you have to compute 
it. Each app_context contains that number - so to get it for the job, you cycle 
across all the app_contexts and add it up.

The mapper assigns the final num_procs value in the orte_job_t object. We do 
this because the user can also run the job without specifying the number of 
procs, and we'll simply run one proc for every allocated slot. It's a popular 
option, but wouldn't work here for obvious reasons.


On Apr 14, 2012, at 2:55 PM, Alex Margolin wrote:

> As to the old version: I'm working in parallel on a patch to branch 1.6 and 
> the trunk, which (the patches, not the versions) are almost identical.
> There is a minor difference in my patch for the RAS: in the trunk I used the 
> preexisting total_slots_alloc while in 1.6 I added it to orte_ras_base 
> (exactly whee it is located in the trunk). I admit it's not the original 
> intent of the author of orte_ras_base data struct specifically or maybe even 
> the RAS component in general, but I see no other way to implement it now...
> 
> What I've written for RAS (attached is my current patch for the 1.6 branch, 
> incl. BTL and ODLS modules previously sent here) is a module which does 2 
> things (for mpirun -n X foo):
> 1. Waits for X slots to become available somewhere in the cluster (optional)
> 2. Create the allocation composed of the X best machines to use
> - This requires the RAS module to know the amount of slots to allocate in 
> advance... is there a better way to do it? (in 1.6/trunk?)
> I tried to access the orte_job_t struct using my jobid from inside the ras 
> module, but that struct isn't initialized with content at that time.
> 
> Thanks,
> Alex
> 
> P.S. I'm preparing a patch for both 1.6 branch and trunk because I want to do 
> some benchmarking (note saying trunk is bad for this purpose) and I want it 
> to be available in the long run. Am I missing something here? I hope I'll get 
> the contributor paper signed so I can commit rather then working on my 
> laptop...
> 
> 
> On 04/13/2012 07:43 PM, Ralph Castain wrote:
>> Looks like you are using an old version - the trunk RAS has changed a bit. 
>> I'll shortly be implementing further changes to support dynamic allocation 
>> requests that might be relevant here as well.
>> 
>> Adding job data to the RAS base isn't a good idea - remember, multiple jobs 
>> can be launching at the same time!
>> 
>> On Apr 13, 2012, at 10:07 AM, Alex Margolin wrote:
>> 
>>> Hi,
>>> 
>>> The next component I'm writing is a component for allocating nodes to
>>> run the processes of an MPI job.
>>> Suppose I have a "getbestnode" executable which not only tells me the
>>> best location for spawning a new process,
>>> but it also reserves the space (for some time), so that every time I run
>>> it I get different results (as the best cores are already reserved).
>>> 
>>> I thought I should write a component under orte/mca/ras, similar to
>>> loadleveler, but the problem is that I can't determine inside the module
>>> the amount of slots required allocate. It gets an list to fill in as a 
>>> parameter, and
>>> I guess it assumes I somehow know how many processes are run because the
>>> allocation was done externally and now I'm just asking the allocator for
>>> the list.
>>> 
>>> A related location, the rmaps, has this information (and much more), but
>>> it doesn't look like a good location for such a module since it maps
>>> already allocated resources, and has a lot of irrelevant code in this case.
>>> 
>>> Maybe the answer is to change the base module a bit, to contain this
>>> information? It could be used as a decent sanity check for other modules
>>> - making sure the external allocation fits the amount of processes we
>>> intend to run. Maybe orte_ras_base_allocate(orte_job_t *jdata) in
>>> ras_base_allocate.c can store the relevant information from jdata in
>>> orte_ras_base? In the long run it can become a parameter passed to the
>>> ras components, but for backwards-compatability the global will do for now.
>>> 
>>> Thanks,
>>> Alex
>>> 
>>> P.S. An RDS component is elaborately mentioned in ras.h, yet it is no
>>> longer available, right?
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> <patch-openmpi-1.6>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RTE node allocation component

Reply via email to