WHAT:   Revise the global ORTE data structures:
                  * orte_app_context_t
                  * orte_node_t
                  * orte_job_t
                  * orte_proc_t

WHY:     The current definitions are rigid and hard to extend. In the past, we 
have extended
               them by hard-coding new fields into the structures. This has led 
to issues for
               off-trunk researchers and developers, and caused the structures 
to balloon in size.


WHEN:    This is pretty disruptive and touches a lot of ORTE files, so let's 
give it a few weeks
                 and set timeout for June 3rd after the telecon


BRANCH:  https://bitbucket.org/rhc/ompi-rtc


PLEASE test your favorite mpirun options to ensure everything is working 
correctly. There are quite a few combinations, and I can't possibly guarantee I 
have hit them all.


****************************
More detail:

As noted in the summary, every time we want to add another capability to the 
system, we frequently wind up adding another dedicated field to the ORTE data 
structures. For example, we have a number of booleans in the structures, each 
of which may only be used in a single, uncommon use-case. Those wanting to 
investigate new capabilities, or developers wishing to add something to the 
system, not only need to add more fields to the structures, but also (a) ensure 
that the datatype support routines know about them, (b) ensure that the odls 
packing/unpacking functions know how to handle it, if the capability involves 
launching processes, and (c) ensure that the nidmap code knows about any new 
data fields.

All together, it is pretty intimidating and fragile - and adds memory footprint 
for every feature.

As many of you know, we are about to add a number of new features to the system 
(e.g., power/freq control, direct cgroup support). After starting to work on 
these, it became apparent that we would be adding yet another set of rarely 
used fields to the various structures, further increasing the memory footprint 
for no good reason. Hence, I undertook a revision of not only the objects, but 
also how we handle their transmission during launch.

The resulting code can be broken down into two key concepts:

* combining frequently used booleans into a single "flag" field in each 
structure - the size of the flag varies between the structures according to the 
number of required booleans. Macros are provided to set/unset/test flags so we 
can easily revise the system as required (e.g., if we need someday to go to 
opal_bitmap_t's instead of simple int-like fields).

* adding a list of "attributes" to each structure where infrequently used 
and/or non-boolean options can be stored. A new "orte_attribute_t" structure is 
defined that provides a key/value storage mechanism for these lists. In order 
to conserve memory, the key is an integer instead of a string. Functions for 
setting and getting attributes are provided. When an attribute is "set", you 
also specify whether it is to be shared globally (i.e., to be included when 
packing the associated structure's attribute list), or to be kept local.

Definition of the new flags and attributes are provided in two new files:

*  orte/util/attr.h - contains key and structure definitions for attributes, 
and flag names plus macros

* orte/util/attr.c - contains the attribute support functions

These revisions have allowed me to not only reduce our memory footprint, but 
also reduce the size of the launch message by removing a lot of duplicated and 
unnecessary info. The nidmap and odls codes have been revamped accordingly.

Comments and/or suggestions are welcomed.
Ralph

Reply via email to