Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

Jeff Squyres Thu, 6 Dec 2007 10:00:21 -0500

On Dec 5, 2007, at 11:23 AM, Ralph H Castain wrote:

Well, I think it is pretty obvious that I am a fan of a attributesystem :)
For completeness, I will point out that we also exchange architecture
and hostname info in the modex.
True - except we should note that hostname info is only exchanged ifsomeone
specifically requests it.


Note that I am a fan of *always* exchanging the hostname information.

I say this because multiple Cisco customers have told us that this isinvaluable debugging information: when a BTL fails to send a message,for example, we specifically put in the error message "hostA tried tosend to hostB and failed" (vs. "communicator X rank Y tried to send torank Z"). System administrators want/need the actual hostnames inorder to [greatly] simplify the process of troubleshooting if thereis a problem in the fabric, and if so, where it is.


This is especially important for very large fabrics.

Do we really need a complete node map? A far as I can tell, it looks
like the MPI layer only needs a list of local processes. So maybe it
would be better to forget about the node ids at the mpi layer andjust
return the local procs.
I agree, though I don't think we want a parallel list of procs. Wejust need
to set the "local" flag in the existing ompi_proc_t structures.

I agree that the desired end result is that we need that "local" flagset in the relevant ompi_proc_t's.

As previously implied: strcmp'ing hostnames is not always sufficient(e.g., on the cray). Hence, sending hostnames around is useful forthe reasons I cited above, but it may not be sufficient for what isneeded.

So my vote would be to leave the modex alone, but remove the node id,
and add a function to get the list of local procs. It doesn'tmatter to
me how the RTE implements that.
I think we would need to be careful here that we don't create a needfor
more communication. We have two functions currently in the modex:
1. how to exchange the info required to populate the ompi_proc_tstructures;
and

2. how to identify which of those procs are "local"

The problem with leaving the modex as it currently sits is that some
environments require a different mechanism for exchanging theompi_proc_t
info. While most can use the RML, some can't. The same division of
capabilities applies to getting the "local" info, so it makes senseto me to
put the modex in a framework.

Otherwise, we wind up with a bunch of #if's in the code to support
environments like the Cray. I believe the mca system was put in place
precisely to avoid those kind of practices, so it makes sense to meto take
advantage of it.

FWIW, I'm very against putting #if's in the code for specificarchitectures / RTE's. Such differences is what the MCA is for.

Alternatively, if we did a process attribute system we could just use
predefined attributes, and the runtime can get each process's node id
however it wants.
Same problem as above, isn't it? Probably ignorance on my part, butit seemsto me that we simply exchange a modex framework for an attributeframework
(since each environment would have to get the attribute values in a
different manner) - don't we?
I have no problem with using attributes instead of the modex, butthe issueappears to be the same either way - you still need a framework tohandle the
different methods.

I agree -- I don't see the difference. Tim -- can you explain? (Ialso didn't quite understand your statement about being a fan ofattribute systems; other than it being an ASCII system with a flatnamespace [why is a flat namespace good, btw?], I don't really see howit's significantly different than the modex principle...?)


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

Reply via email to