The ultimate goal is to not add an additional dependency for serialization of
the hwloc topology information. One way or another, we'll get there.
On Sep 6, 2011, at 11:46 AM, George Bosilca wrote:
> I guess that as long as there is an option to have any need for XML support
> compiled out, there is no reason to complain.
>
> george.
>
> On Sep 6, 2011, at 17:36 , Jeff Squyres wrote:
>
>> Don't forget that this RFC has a timeout of today. I didn't think it would
>> be controversial, which is why it had a short timeout.
>>
>> -----
>>
>> Josh brought up a good point on the teleconf today that he'd like to be able
>> to have hwloc without the the additional libxml dependency (i.e., the way it
>> is on the trunk today).
>>
>> Remember that making hwloc a 1st class citizen is the first step of a
>> multi-sept plan (i.e., part of revamping paffinity in general). As part of
>> the larger plan, we had planned to -- at least for a short while -- enable
>> XML support in hwloc. Ralph and I will discuss this; I *think* we should be
>> able to bring in the overall hwloc support without XML.
>>
>> For the future, hwloc is exploring either supporting some other text format
>> that won't have an additional dependency (e.g., JSON), or re-writing its XML
>> support to drop the libxml dependency.
>>
>>
>> On Aug 31, 2011, at 3:05 PM, Jeff Squyres wrote:
>>
>>> WHAT: Move hwloc up to be a first-class citizen in OPAL (while still making
>>> it possible to compile it out for platforms that don't need it)
>>>
>>> WHY: I previously sent a similar RFC to this one, but it got shot down in
>>> favor of hiding hwloc's functionality under abstraction. After playing
>>> with this for some time, we're now firmly in the belief that the additional
>>> abstraction doesn't buy OMPI anything.
>>>
>>> WHERE: A new compile-time-one-of-many framework like libevent:
>>> opal/mca/hwloc.
>>>
>>> WHEN: as part of the paffinity changes being worked on by Jeff, Josh,
>>> Terry, and Ralph.
>>>
>>> TIMEOUT: Teleconf, Tuesday, Sep 6.
>>>
>>> --> Short timeout because I *think* the only person that objected to the
>>> prior RFC (Ralph) has now been converted. Hence, I think this will be
>>> non-controversial. See below.
>>>
>>> --------------------------------------
>>>
>>> MORE DETAIL:
>>>
>>> There are many people who want to use hwloc within the OMPI code base for
>>> many different reasons. We've struggled how to do so for two reasons:
>>>
>>> 1. avoid a complete dependence on hwloc
>>> 2. be able to compile it out for platforms that don't want/need it (e.g.,
>>> Cray)
>>>
>>> The initial objection to my long-ago RFC was that you could hide hwloc
>>> under some abstraction and therefore easily be able to handle compiling
>>> hwloc out, supporting platforms that hwloc doesn't support, and potentially
>>> be able to replace hwloc with something else someday, if desired.
>>>
>>> After wrestling with this for a good long while, none of those goals seem
>>> workable via a thin layer of abstraction.
>>>
>>> Instead, let's just call a spade a spade: we'll be dependent upon hwloc.
>>> We'll provide a mechanism to compile it out for Cray and other embedded
>>> platforms.
>>>
>>> Here's the plan:
>>>
>>> 1. Make a new framework opal/mca/hwloc. We'll initially have 3 components:
>>> - hwloc121: hwloc distribution v1.2.1
>>> - system: the system-installed hwloc
>>> - none: for platforms that don't want hwloc support
>>>
>>> Just like the libevent framework, we can introduce new versions of hwloc
>>> (e.g., 1.3) as new components. Old versions/components can be deleted as
>>> new versions/components are stabilized.
>>>
>>> 2. The hwloc framework will be like the libevent framework; only one of
>>> these components will be compiled. The component's hwloc API will be
>>> directly available (via name-shifting) to the rest of OPAL/ORTE/OMPI. No
>>> need for the usual structs of function pointers and whatnot.
>>>
>>> 3. The rest of the OPAL / ORTE / OMPI code base can use the hwloc API in
>>> the following way:
>>>
>>> 3a. opal_init() will initialize hwloc and load a central copy of the local
>>> machine's topology in a global variable. Anyone in the code base can use
>>> this global variable; its use does not need to be protected by #if
>>> _whatever_. However, its value may be NULL for platforms that hwloc doesn't
>>> support or installations that used the "none" hwloc component.
>>>
>>> 3b. opal_config.h will contain the macro OPAL_HAVE_HWLOC, which will be
>>> either 0 or 1. Any code that uses the hwloc API must protect itself with
>>> #if OPAL_HAVE_HWLOC, because installations that use the "none" hwloc
>>> component won't be able to link resolve any of the hwloc symbols.
>>>
>>> Meaning that you could do something like:
>>>
>>> if (NULL != opal_hwloc_topology) {
>>> #if OPAL_HAVE_HWLOC
>>> // ...use hwloc API, etc.
>>> #endif
>>> }
>>>
>>> 4. After steps 1-3 are all done, the paffinity and maffinity frameworks can
>>> be deleted and replaced with the corresponding hwloc calls.
>>>
>>> Meaning: if we've got hwloc, the paffinity and maffinity frameworks now
>>> become redundant. So let's whack them. This can happen after 1-3 are done
>>> and stable in the trunk, however.
>>>
>>> --
>>> Jeff Squyres
>>> [email protected]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> [email protected]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/