On Jan 3, 2008, at 9:03 AM, Gleb Natapov wrote:
In Paris we've talked about putting HCA discovery and
initialization code
outside of openib BTL so other components that want to use IB will
be able
to share common code, data and registration cache. Other components
I am
thinking about are ofud and multicast collectives. I started to look
at
this and I have a couple of problems with this approach. Currently
openib
BTL has if_include/if_exclude parameters to control which HCAs
should be
used. Should we make those parameters global and initialize only HCAs
that are not exulted by those filters, or should we initialize all
HCAs
and each component will have its own include/exclude filters?
Good question. I think the optimal solution would be to have one set
of globals (common_of_if_include or somesuch?) with optional per-
component overrides. E.g., tell all of OMPI to if_include mthca0, but
then tell just the multicast collectives to if_include ipath1 (for
whatever reason). This would allow fine-grained selection of which
communication types use which devices.
To minimize the repetition of code, this could be effected by having a
function in the common/of area that does all the work for the include/
exclude behavior. You can simply call it with any of the MCA param
values, such as: common_of_if_in/exclude, btl_openib_if_in/exclude,
coll_of_if_in/exclude, ... and it can return a list of ports to use.
Another
problem is how multicast collective knows that all processes in a
communicator are reachable via the same network, do we have a
mechanism
in ompi to check this?
Good question.
Perhaps the common_of stuff could hang some data off the ompi_proc_t
that can be read by any of-like component (btl openib, coll of
multicast, etc.)...? This could contain a subnet ID, or perhaps a
reachable flag, or somesuch.
--
Jeff Squyres
Cisco Systems