Hello, Happy new year btw :D
Considering future network topology support, I believe we probably need to fix a couple of things before releasing 1.0. Just to sum up the a bunch of points that have been raised in the past months: - there should be a way to have the complete toplogy in just one tree, so you can browse in it and assign tasks/process/whatever in it, according to architectural details provided by hwloc, but also network details like bandwidth etc. - the core of hwloc mustn't force any kind of tools, it must be easy to either build something around hwloc detection and binding functions, or load detection & binding plugins. The way I see it is to provide a hwloc_topology_combine() function that takes a series of several hwloc_topology_t trees and an object type, and builds a tree that contains a new object of that type, under which the trees appear. That combination can actually already be done by hand by catenating xml files. For instance, on a simple cluster you'd run lstopo on each machine and save xml files, load them together, combine them under a "network" object (being able to register dynamic object types should be easy), and save the result as an xml file, which thus contains the complete topology of the cluster. A task dispatcher can thus browse it at will etc. When it comes about binding, it'd be the task dispatcher's role to first run the application to the target machine, and there run hwloc to perform the actual binding, according to the cpuset in the tree. Now, coming to semantic changes: - The top node of the tree wouldn't necessarily be a system object. Actually, having always the top object having the system type is not providing any useful information :), and it makes us duplicate fields between system and machine. On usual (non-Kerrighed) machines, the top node would just be machine. On Kerrighed systems, the top node would be system. On networked systems, the top node would be a switch or the Internet :) As a consequence, hwloc_get_system_obj would have to be renamed to hwloc_get_root_obj. - Objects of network trees may not have cpusets defined (Trees obtained directly from hwloc with defaults parameter would still have cpusets on every node however). It does not make sense to merge cpusets of different machines (they will conflict), and things like shifting cpusets to be able to merge them would probably only bring issues. That being said, that does not prevent from writing a transparency plugin that automatically discovers the network topology, shifts cpusets, and when requested for binding, automatically migrates to the machine according to the shift, and uses the underlying OS hooks to perform the binding. My point is that the hwloc combining operation wouldn't fix cpusets itself and leave them NULL. The caller of the combining operation will be responsible for that. - This also means there can't be "global" cpusets like the recently added hwloc_topology_get_{topology,complete,online,allowed}_cpuset functions (not released yet). These can just be moved to the hwloc_obj structure, thus being available for each object, which could actually be helpful btw. - Helpers that take cpuset parameters of course don't make sense any more when applied to networked topologies. But it probably doesn't make sense for the caller to call them in the first place, and the caller knows it since it's the caller that has first called the combining operation or loaded an XML file resulting from it. If, however, at some point (after having distributed tasks between machines for instance), operations with cpusets are desired, we could provide a duplication function that takes a topology object parameter A and builds a new topology tree containing all the objects under A, A thus being its root, and then (if A indeed has a cpuset, but the caller should know that) heleprs taking cpuset parameters can be called. So, to sum it up: - hwloc_get_obj_by_depth(topo, 0, 0) may not be a system object any more (actually it'd only be one on kerrighed systems). - no global cpuset field, only in objects. The second point shouldn't harm, it's just a matter of fixing the (not yet released) API. The first point clearly contradicts the current documentation (“HWLOC_OBJ_SYSTEM will always be the highest”), but I believe not breaking it as soon as now will tie us from further extensions anyway, and I don't think much code relies on it anyway. The plan I see is that for 1.0 we only check that catenating .XML files by hand to build misc levels representing network layers does indeed work, which should mean that actual combining functions etc. should be possible to implement later. Please comment/disagree/agree :) Samuel