On Sep 22, 2011, at 2:25 PM, Brice Goglin wrote: > Le 22/09/2011 21:36, Jeff Squyres a écrit : >> 1. The depth-specific accessors are Bad. Given the warning language in the >> docs paired with the practical realities that some people actually do mix >> and match CPUs in a single server (especially when testing new chips), the >> depth-based accessors *can/will* fail. Meaning: you have to write >> application code that can handle the non-uniform depth cases, making the >> depth-based accessors essentially useless. > > I don't see any real problem with having depth accessors and mixed types > of CPUs in a server. You can have different levels of caches in > different CPUs,, but you still have a uniform depth/level for important > things like PUs, Core, Socket.
I guess I didn't get that from your documentation. Since caches sit between socket and core, they appear to affect the depth of the core in a given socket. Thus, if there are different numbers of caches in the different sockets on a node, then the core/pu level would change across the sockets. Is that not true? > > The only problem so far is caches. But do you actually walk the list of > caches? Yes we do > People would walk the list of PUs, Cores, Sockets, NUMA nodes. > But when talking about Caches, I would rather see them ask "which cache > do I have above these cores?". But that isn't exactly how people use that info. Instead, they ask us to "map N processes on each L2 cache across the node", or to "bind all procs to their local L3 cache". > > And I don't see how DFS would help. Any concrete example? As above. If I'm trying to map a process to (say) a core, then I have to search for all the cores. If the system has different numbers of caches on each socket, then the current search for a core object seems to have a problem as it is looking at a specific depth, yet the cores are at different depths on each socket. So I have to manually traverse the tree looking for core objects at any depth. Perhaps my understanding of your tree topology is wrong, though... > >> > >> But we're using the XML export in OMPI to send the topology of compute nodes >> up to the scheduler, where decisions are made about how to lay out processes >> on the back-end compute nodes, what the binding width will be, etc. This >> front-end scheduler needs to know whether the back-end node is capable of >> supporting binding, for example. >> >> We manually added this information into the message that we send up to the >> scheduler, but it would be much nicer if the XML export/import just handled >> that automatically. > > I guess we could add some "support" attributes to the XML. > > Does your scheduler actually need to know if binding is supported? What > does it do if not supported? Can't just try to bind and get an error if > not supported? When dealing with large scale systems, it is much faster and easier to check these things -before- launching the job. Remember, on these systems, it can take minutes to launch a full-scale job! Nobody wants to sit there for that much time, only to find that the system doesn't support the requested operation. > > Brice >
