Kenneth Lloyd wrote:
My 2 cents: Carto is a weighted graph structure that describes the topology
of the compute cluster, not just locations of nodes. Many view topologies
(trees, meshes, torii) to be static - but I've found this an unnecessary and
undesirable constraint.
The compute fabric may better be left open to dynamic configuration,
dependent upon the partitioning of jobs, tasks and data to be run.
How do others see this?
At a network and actually even a node's resource level I think a case
can be made for a dynamically changing topology as you mention above.
However, is MPI the right level to compensate for interfaces coming and
going?
It would be nice/cool if there was an APM like feature that spanned HCAs
and not just between ports on the same HCA available at a network api
level. I know why this is currently done the way it is for IB but it
always struck me that you'd want to handle interface/path changes below
MPI. That way more than just MPI codes could reap the benefits.
At a node level the whole locality issue of a process in relation to its
memory or other processes seem to cry out to possibly be more of a OS
type of job than MPI. Reason being is first you could end up with quite
a complex layout for a job and second things really become complicated
if you want to take into account other MPI jobs.
The above being said, I don't hold too much hope that things below MPI
will actually take on these tasks, even though it seems like a logical
level for these things to occur IMO.
Anyways, I think keeping in mind dynamic changes is well worth it but it
seems to start moving there from a static position makes a lot of sense.
--td
Ken Lloyd
-----Original Message-----
From: devel-boun...@open-mpi.org
[mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
Sent: Monday, December 14, 2009 6:47 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] carto vs. hwloc
I had a nice chat with Ralph this afternoon about this topic.
He pointed out a few things to me:
- I had forgotten (ahem) that carto has weights associated
with each of its edges (and that's kind of a defining
feature). hwloc, at present, does not. So perhaps hwloc
would not initially replace carto -- maybe in some future
future hwloc version.
- He also pointed out that not only paffinity, but also
sysinfo, could be replaced if hwloc comes in.
He also made a good point that hwloc is only "sorta"
extensible right now -- meaning that, sure, you can add
support for new OS's and platforms, but not in as easy/clean
a way as we have in Open MPI. Specifically, adding new
support right now means editing much of the current hwloc
code: configure, adding #if's to the top-level tools and
library core, etc. It's not nearly as clean as just adding a
new plugin that is totally independent of the rest of the
code base. He thought it would be [greatly] beneficial if
hwloc uses the same plugin system as Open MPI before bringing
it in. Indeed, Open MPI may wish to extend hwloc in ways
that the main hwloc project is not interested in extending
(e.g., supporting some of Cisco's custom hardware). Fair point.
Additionally, the topic of plugins came up within the context
of heterogeneity: have code to get the topology of the
machine (RAM + processors), but have separate code to mix in
accelerators/co-processors and other entities in the box.
One could easily imagine plugins for each different type of
entity that you would want to detect within a server.
To some extent, the hwloc crew has already been discussing
these issues -- we can probably work elements of much of it
into what we're doing. For example, Brice and Samuel are
working on adding PCI device support to hwloc (although I
haven't been following the details of what they're doing).
We've also talked about adding hwloc functions for editing
the map that comes back. For example, hwloc could be used as
the cornerstone for a new OPAL framework base, and new
plugins in this base can use functions to add more
information to the initial map that is reported back by the
hwloc core. [shrug] Need to think about that more.
This is all excellent feedback (I need to take it back to the
hwloc crew); please let me know what else you think about
these ideas tomorrow on the call.
On Dec 14, 2009, at 4:13 PM, Jeff Squyres wrote:
Question for everyone (possibly a topic for tomorrow's call...):
hwloc is evolving into a fairly nice package. It's not
ready for inclusion into Open MPI yet, but it's getting
there. I predict it will come in somewhere early in the 1.5
series (potentially not 1.5.0, though). hwloc will provide
two things:
1. A listing of all processors and memory, to include
caches (and cache sizes!) laid out in a map, so you can see
what processors share what memory (e.g., caches). Open MPI
currently does not have this capability. Additionally, hwloc
is currently growing support to include PCI devices in the
map; that may make it into hwloc v1.0 or not.
2. Cross-platform / OS support. hwloc currently support a
nice variety of OSs and hardware platforms.
Given that hwloc is already cross-platform, do we really
need the carto framework? I.e., do we really need multiple
carto plugins? More specifically: should we just use hwloc
directly -- with no framework?
Random points:
- I'm about halfway finished with "embedding" code for
hwloc like PLPA has, so, for example, all of hwloc's symbols
can be prepended with opal_ or orte_ or whatever. Hence,
embedding hwloc in OMPI would be "safe".
- If we keep the carto framework, then we'll have to
translate from hwloc's map to carto's map; there may be
subtleties involved in the translation.
- I guarantee that [much] more thought has been put into
the hwloc map data structure design than carto's. :-)
Indeed, to make all of hwloc's data available to OMPI,
carto's map data structures may end up evolving to look
pretty much exactly like hwloc's. In which case -- what's
the point of carto?
Thoughts?
hwloc also provides processor binding functions, so it
might also make the paffinity framework moot...
--
Jeff Squyres
jsquy...@cisco.com
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel