Kenneth Lloyd wrote:
My 2 cents: Carto is a weighted graph structure that describes the topology
of the compute cluster, not just locations of nodes. Many view topologies
(trees, meshes, torii) to be static - but I've found this an unnecessary and
undesirable constraint.

The compute fabric may better be left open to dynamic configuration,
dependent upon the partitioning of jobs, tasks and data to be run.

How do others see this?

At a network and actually even a node's resource level I think a case can be made for a dynamically changing topology as you mention above. However, is MPI the right level to compensate for interfaces coming and going? It would be nice/cool if there was an APM like feature that spanned HCAs and not just between ports on the same HCA available at a network api level. I know why this is currently done the way it is for IB but it always struck me that you'd want to handle interface/path changes below MPI. That way more than just MPI codes could reap the benefits. At a node level the whole locality issue of a process in relation to its memory or other processes seem to cry out to possibly be more of a OS type of job than MPI. Reason being is first you could end up with quite a complex layout for a job and second things really become complicated if you want to take into account other MPI jobs.

The above being said, I don't hold too much hope that things below MPI will actually take on these tasks, even though it seems like a logical level for these things to occur IMO.

Anyways, I think keeping in mind dynamic changes is well worth it but it seems to start moving there from a static position makes a lot of sense.

--td
Ken Lloyd

-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
Sent: Monday, December 14, 2009 6:47 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] carto vs. hwloc

I had a nice chat with Ralph this afternoon about this topic.

He pointed out a few things to me:

- I had forgotten (ahem) that carto has weights associated with each of its edges (and that's kind of a defining feature). hwloc, at present, does not. So perhaps hwloc would not initially replace carto -- maybe in some future future hwloc version.

- He also pointed out that not only paffinity, but also sysinfo, could be replaced if hwloc comes in.

He also made a good point that hwloc is only "sorta" extensible right now -- meaning that, sure, you can add support for new OS's and platforms, but not in as easy/clean a way as we have in Open MPI. Specifically, adding new support right now means editing much of the current hwloc code: configure, adding #if's to the top-level tools and library core, etc. It's not nearly as clean as just adding a new plugin that is totally independent of the rest of the code base. He thought it would be [greatly] beneficial if hwloc uses the same plugin system as Open MPI before bringing it in. Indeed, Open MPI may wish to extend hwloc in ways that the main hwloc project is not interested in extending (e.g., supporting some of Cisco's custom hardware). Fair point.

Additionally, the topic of plugins came up within the context of heterogeneity: have code to get the topology of the machine (RAM + processors), but have separate code to mix in accelerators/co-processors and other entities in the box. One could easily imagine plugins for each different type of entity that you would want to detect within a server.

To some extent, the hwloc crew has already been discussing these issues -- we can probably work elements of much of it into what we're doing. For example, Brice and Samuel are working on adding PCI device support to hwloc (although I haven't been following the details of what they're doing). We've also talked about adding hwloc functions for editing the map that comes back. For example, hwloc could be used as the cornerstone for a new OPAL framework base, and new plugins in this base can use functions to add more information to the initial map that is reported back by the hwloc core. [shrug] Need to think about that more.

This is all excellent feedback (I need to take it back to the hwloc crew); please let me know what else you think about these ideas tomorrow on the call.



On Dec 14, 2009, at 4:13 PM, Jeff Squyres wrote:

Question for everyone (possibly a topic for tomorrow's call...):

hwloc is evolving into a fairly nice package. It's not
ready for inclusion into Open MPI yet, but it's getting there. I predict it will come in somewhere early in the 1.5 series (potentially not 1.5.0, though). hwloc will provide two things:
1. A listing of all processors and memory, to include
caches (and cache sizes!) laid out in a map, so you can see what processors share what memory (e.g., caches). Open MPI currently does not have this capability. Additionally, hwloc is currently growing support to include PCI devices in the map; that may make it into hwloc v1.0 or not.
2. Cross-platform / OS support. hwloc currently support a
nice variety of OSs and hardware platforms.
Given that hwloc is already cross-platform, do we really
need the carto framework? I.e., do we really need multiple carto plugins? More specifically: should we just use hwloc directly -- with no framework?
Random points:

- I'm about halfway finished with "embedding" code for
hwloc like PLPA has, so, for example, all of hwloc's symbols can be prepended with opal_ or orte_ or whatever. Hence, embedding hwloc in OMPI would be "safe".
- If we keep the carto framework, then we'll have to
translate from hwloc's map to carto's map; there may be subtleties involved in the translation.
- I guarantee that [much] more thought has been put into
the hwloc map data structure design than carto's. :-) Indeed, to make all of hwloc's data available to OMPI, carto's map data structures may end up evolving to look pretty much exactly like hwloc's. In which case -- what's the point of carto?
Thoughts?

hwloc also provides processor binding functions, so it
might also make the paffinity framework moot...
--
Jeff Squyres
jsquy...@cisco.com

--
Jeff Squyres
jsquy...@cisco.com


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to