On Jan 3, 2008, at 9:03 AM, Gleb Natapov wrote:

In Paris we've talked about putting HCA discovery and initialization code outside of openib BTL so other components that want to use IB will be able to share common code, data and registration cache. Other components I am thinking about are ofud and multicast collectives. I started to look at this and I have a couple of problems with this approach. Currently openib BTL has if_include/if_exclude parameters to control which HCAs should be
used. Should we make those parameters global and initialize only HCAs
that are not exulted by those filters, or should we initialize all HCAs
and each component will have its own include/exclude filters?

Good question. I think the optimal solution would be to have one set of globals (common_of_if_include or somesuch?) with optional per- component overrides. E.g., tell all of OMPI to if_include mthca0, but then tell just the multicast collectives to if_include ipath1 (for whatever reason). This would allow fine-grained selection of which communication types use which devices.

To minimize the repetition of code, this could be effected by having a function in the common/of area that does all the work for the include/ exclude behavior. You can simply call it with any of the MCA param values, such as: common_of_if_in/exclude, btl_openib_if_in/exclude, coll_of_if_in/exclude, ... and it can return a list of ports to use.

Another
problem is how multicast collective knows that all processes in a
communicator are reachable via the same network, do we have a mechanism
in ompi to check this?


Good question.

Perhaps the common_of stuff could hang some data off the ompi_proc_t that can be read by any of-like component (btl openib, coll of multicast, etc.)...? This could contain a subnet ID, or perhaps a reachable flag, or somesuch.

--
Jeff Squyres
Cisco Systems

Reply via email to