Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
This RFC was committed to the trunk yesterday. Please let us know if you run into any problems with it. *** Remember that although the opal_hwloc_topology global variable will always be available, ##IT MAY BE NULL## on platforms where hwloc was compiled out / not supported. Therefore, you MUST protect access to hwloc API calls with #if OPAL_HAVE_HWLOC! See the original RFC text below. Also note that there is at least one deviation from OMPI norms: if you don't want hwloc support at all, you need to explicitly disable it via --without-hwloc. Otherwise, if no hwloc component configures successfully, configure will fail. (this is due to some fairly tricky configury issues; I'll try to fix this soon so that it's safe to have no hwloc components without needing --without-hwloc) Finally, note that hwloc's XML capabilities have been disabled by default. You can enable them via --enable-hwloc-xml, which will also turn on additional machinery in ORTE to send node topology info back to the HNP for accurate process mapping (more on this coming soon). On Aug 31, 2011, at 3:05 PM, Jeff Squyres wrote: > WHAT: Move hwloc up to be a first-class citizen in OPAL (while still making > it possible to compile it out for platforms that don't need it) > > WHY: I previously sent a similar RFC to this one, but it got shot down in > favor of hiding hwloc's functionality under abstraction. After playing with > this for some time, we're now firmly in the belief that the additional > abstraction doesn't buy OMPI anything. > > WHERE: A new compile-time-one-of-many framework like libevent: opal/mca/hwloc. > > WHEN: as part of the paffinity changes being worked on by Jeff, Josh, Terry, > and Ralph. > > TIMEOUT: Teleconf, Tuesday, Sep 6. > > --> Short timeout because I *think* the only person that objected to the > prior RFC (Ralph) has now been converted. Hence, I think this will be > non-controversial. See below. > > -- > > MORE DETAIL: > > There are many people who want to use hwloc within the OMPI code base for > many different reasons. We've struggled how to do so for two reasons: > > 1. avoid a complete dependence on hwloc > 2. be able to compile it out for platforms that don't want/need it (e.g., > Cray) > > The initial objection to my long-ago RFC was that you could hide hwloc under > some abstraction and therefore easily be able to handle compiling hwloc out, > supporting platforms that hwloc doesn't support, and potentially be able to > replace hwloc with something else someday, if desired. > > After wrestling with this for a good long while, none of those goals seem > workable via a thin layer of abstraction. > > Instead, let's just call a spade a spade: we'll be dependent upon hwloc. > We'll provide a mechanism to compile it out for Cray and other embedded > platforms. > > Here's the plan: > > 1. Make a new framework opal/mca/hwloc. We'll initially have 3 components: > - hwloc121: hwloc distribution v1.2.1 > - system: the system-installed hwloc > - none: for platforms that don't want hwloc support > > Just like the libevent framework, we can introduce new versions of hwloc > (e.g., 1.3) as new components. Old versions/components can be deleted as new > versions/components are stabilized. > > 2. The hwloc framework will be like the libevent framework; only one of these > components will be compiled. The component's hwloc API will be directly > available (via name-shifting) to the rest of OPAL/ORTE/OMPI. No need for the > usual structs of function pointers and whatnot. > > 3. The rest of the OPAL / ORTE / OMPI code base can use the hwloc API in the > following way: > > 3a. opal_init() will initialize hwloc and load a central copy of the local > machine's topology in a global variable. Anyone in the code base can use this > global variable; its use does not need to be protected by #if _whatever_. > However, its value may be NULL for platforms that hwloc doesn't support or > installations that used the "none" hwloc component. > > 3b. opal_config.h will contain the macro OPAL_HAVE_HWLOC, which will be > either 0 or 1. Any code that uses the hwloc API must protect itself with #if > OPAL_HAVE_HWLOC, because installations that use the "none" hwloc component > won't be able to link resolve any of the hwloc symbols. > > Meaning that you could do something like: > > if (NULL != opal_hwloc_topology) { > #if OPAL_HAVE_HWLOC > // ...use hwloc API, etc. > #endif > } > > 4. After steps 1-3 are all done, the paffinity and maffinity frameworks can > be deleted and replaced with the corresponding hwloc calls. > > Meaning: if we've got hwloc, the paffinity and maffinity frameworks now > become redundant. So let's whack them. This can happen after 1-3 are done > and stable in the trunk, however. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doi
Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
On Sep 12, 2011, at 8:51 AM, Jeff Squyres wrote: > *** Remember that although the opal_hwloc_topology global variable will > always be available, ##IT MAY BE NULL## on platforms where hwloc was compiled > out / not supported. Therefore, you MUST protect access to hwloc API calls > with #if OPAL_HAVE_HWLOC! See the original RFC text below. Oops! Ralph just reminded me that this was slightly inaccurate. If hwloc is not present, then the global variable opal_hwloc_topology won't be present at all (because its corresponding hwloc type won't be available). Hence, the example in the original RFC isn't quite right: >> if (NULL != opal_hwloc_topology) { >> #if OPAL_HAVE_HWLOC >> // ...use hwloc API, etc. >> #endif >> } This really should be: #if OPAL_HAVE_HWLOC if (NULL != opal_hwloc_topology) { // ...use hwloc API, etc. } #endif Sorry for the confusion! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
According to http://gcc.gnu.org/onlinedocs/cpp/If.html "The `#if' directive allows you to test the value of an arithmetic expression, rather than the mere existence of one macro." Is the objective to test for the existence of the macro, its value, or its value IFF it exists? Ken Lloyd -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Monday, September 12, 2011 7:07 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen On Sep 12, 2011, at 8:51 AM, Jeff Squyres wrote: > *** Remember that although the opal_hwloc_topology global variable will always be available, ##IT MAY BE NULL## on platforms where hwloc was compiled out / not supported. Therefore, you MUST protect access to hwloc API calls with #if OPAL_HAVE_HWLOC! See the original RFC text below. Oops! Ralph just reminded me that this was slightly inaccurate. If hwloc is not present, then the global variable opal_hwloc_topology won't be present at all (because its corresponding hwloc type won't be available). Hence, the example in the original RFC isn't quite right: >> if (NULL != opal_hwloc_topology) { >> #if OPAL_HAVE_HWLOC >> // ...use hwloc API, etc. >> #endif >> } This really should be: #if OPAL_HAVE_HWLOC if (NULL != opal_hwloc_topology) { // ...use hwloc API, etc. } #endif Sorry for the confusion! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel - No virus found in this message. Checked by AVG - www.avg.com Version: 10.0.1392 / Virus Database: 1520/3891 - Release Date: 09/11/11
Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
OPAL_HAVE_HWLOC will always be defined to 0 or 1. On Sep 12, 2011, at 9:46 AM, Kenneth Lloyd wrote: > According to http://gcc.gnu.org/onlinedocs/cpp/If.html > > "The `#if' directive allows you to test the value of an arithmetic > expression, rather than the mere existence of one macro." > > Is the objective to test for the existence of the macro, its value, or its > value IFF it exists? > > Ken Lloyd > > -Original Message- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Jeff Squyres > Sent: Monday, September 12, 2011 7:07 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen > > On Sep 12, 2011, at 8:51 AM, Jeff Squyres wrote: > >> *** Remember that although the opal_hwloc_topology global variable will > always be available, ##IT MAY BE NULL## on platforms where hwloc was > compiled out / not supported. Therefore, you MUST protect access to hwloc > API calls with #if OPAL_HAVE_HWLOC! See the original RFC text below. > > Oops! Ralph just reminded me that this was slightly inaccurate. > > If hwloc is not present, then the global variable opal_hwloc_topology won't > be present at all (because its corresponding hwloc type won't be available). > Hence, the example in the original RFC isn't quite right: > >>> if (NULL != opal_hwloc_topology) { >>> #if OPAL_HAVE_HWLOC >>> // ...use hwloc API, etc. >>> #endif >>> } > > This really should be: > > #if OPAL_HAVE_HWLOC > if (NULL != opal_hwloc_topology) { > // ...use hwloc API, etc. > } > #endif > > Sorry for the confusion! > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > - > No virus found in this message. > Checked by AVG - www.avg.com > Version: 10.0.1392 / Virus Database: 1520/3891 - Release Date: 09/11/11 > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/