On Sep 22, 2011, at 2:25 PM, Brice Goglin wrote:

> Le 22/09/2011 21:36, Jeff Squyres a écrit :
>> 1. The depth-specific accessors are Bad.  Given the warning language in the 
>> docs paired with the practical realities that some people actually do mix 
>> and match CPUs in a single server (especially when testing new chips), the 
>> depth-based accessors *can/will* fail.  Meaning: you have to write 
>> application code that can handle the non-uniform depth cases, making the 
>> depth-based accessors essentially useless.
> 
> I don't see any real problem with having depth accessors and mixed types
> of CPUs in a server. You can have different levels of caches in
> different CPUs,, but you still have a uniform depth/level for important
> things like PUs, Core, Socket.

I guess I didn't get that from your documentation. Since caches sit between 
socket and core, they appear to affect the depth of the core in a given socket. 
Thus, if there are different numbers of caches in the different sockets on a 
node, then the core/pu level would change across the sockets.

Is that not true?

> 
> The only problem so far is caches. But do you actually walk the list of
> caches?

Yes we do

> People would walk the list of PUs, Cores, Sockets, NUMA nodes.
> But when talking about Caches, I would rather see them ask "which cache
> do I have above these cores?".

But that isn't exactly how people use that info. Instead, they ask us to "map N 
processes on each L2 cache across the node", or to "bind all procs to their 
local L3 cache".

> 
> And I don't see how DFS would help. Any concrete example?

As above. If I'm trying to map a process to (say) a core, then I have to search 
for all the cores. If the system has different numbers of caches on each 
socket, then the current search for a core object seems to have a problem as it 
is looking at a specific depth, yet the cores are at different depths on each 
socket. So I have to manually traverse the tree looking for core objects at any 
depth.

Perhaps my understanding of your tree topology is wrong, though...

> 
>> 
> 
>> But we're using the XML export in OMPI to send the topology of compute nodes 
>> up to the scheduler, where decisions are made about how to lay out processes 
>> on the back-end compute nodes, what the binding width will be, etc.  This 
>> front-end scheduler needs to know whether the back-end node is capable of 
>> supporting binding, for example.
>> 
>> We manually added this information into the message that we send up to the 
>> scheduler, but it would be much nicer if the XML export/import just handled 
>> that automatically.
> 
> I guess we could add some "support" attributes to the XML.
> 
> Does your scheduler actually need to know if binding is supported? What
> does it do if not supported? Can't just try to bind and get an error if
> not supported?

When dealing with large scale systems, it is much faster and easier to check 
these things -before- launching the job. Remember, on these systems, it can 
take minutes to launch a full-scale job! Nobody wants to sit there for that 
much time, only to find that the system doesn't support the requested operation.


> 
> Brice
> 


Reply via email to