There are definite things that won't work by default in Open MPI with
mixed vendor HCAs.
As you mentioned, the Open MPI IB vendors committed to making this
work, but the effort kinda died. You might want to ping them again to
remind them...?
On Jun 28, 2009, at 2:03 PM, Scott A. Friedman wrote:
We have had several tickets submitted by users since we have started
adding Qlogic 7240 cards into our cluster which is mostly Mellanox (we
have a couple different cards). We have looked at the codes (MPI
based)
and they do run fine when the Qlogic cards are excluded. Qlogic
suggests
using PSM or IPoIB on our cluster - both of which seem like a punt
to us
as PSM doesn't make sense with Mellanox and IPofIB is not a solution.
Right now, we are trying to figure out where the problem is - it is
not
at the application level as we have distilled down to a specific case
which will cause a problem (MPI all-to-all, for example). However,
some
things seem clearer to us.
1. test case works when using verbs using Mellanox only
2. test case works ok when we use PSM on Qlogic only
3. test case fails when using verbs between Mellanox and Qlogic
4. test case fails when using verbs on Qlogic
Is this a verb level issue with the ipath stuff or an mpi problem? Or,
is the issue someplace else? There had been some discussion of a mixed
environment early this year on the OMPI list but the thread petered
out.
We would be happy to share our failing test case with whomever does
the
interop testing - if it could shed some light on the problem we see.
The point is that we would like to know that different IB cards work
together (like ethernet) so we can have a choice.
Sean Hefty wrote:
>> Is a mixed HCA environment cluster not ready for prime time - yet?
>
> Are the crashes in the kernel or userspace? Is there a specific
HCA on the
> nodes that crash?
>
> Interop testing is done, but I do not know the details of the
configurations and
> tests that are run.
>
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
--
Jeff Squyres
Cisco Systems
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general