On Jan 26, 2009, at 4:46 PM, Jeff Squyres wrote:

Note that I did not say that. I specifically stated that OMPI failed and it is due to the fact that we are customizing for the individual hardware devices. To be clear: this is an OMPI issue. I'm asking (at the request of the IWG) if anyone cares about fixing it.


I should clarify something in this discussion: Open MPI is *capable* of running in heterogeneous OpenFabrics hardware (assuming IB <--> IB and iWARP <--> iWARP, of course -- not IB <--> iWARP) as long as it is configured to use the same verbs/hardware configuration on all the hardware. Depending on the hardware, Open MPI may not be configured to run this way by default because it may choose to customize differently for different HCAs/RNICs.

However, if one manually configures Open MPI to use the same verbs/ hardware configuration values across all the HCAs/RNICs in your cluster, Open MPI will probably work fine. If Open MPI doesn't work in this kind of configuration, it may indicate some kind of vendor HCA/ RNIC incompatibility.

Case in point: I regression test "limited heterogeneous" scenarios on my MPI testing cluster at Cisco every night. Specifically, I have a variety of different models of Mellanox HCAs and they all interoperate just fine across 2 air-gapped IB subnets. I don't know if anyone has tested with wildly different HCAs/RNICs using some lowest-common denominator verbs/hardware configuration values (i.e., some set of values that is supported by all HCAs/RNICs) to see if OMPI works. I don't immediately see why that wouldn't work, but I haven't tested it myself.

Out of the box, however, Open MPI is not necessarily configured to have the same verbs/hardware configuration for each device. That is what may fail by default.

--
Jeff Squyres
Cisco Systems

Reply via email to