On Jun 2, 2010, at 5:08 AM, Sylvain Jeaugey wrote:
> It must be because create_cq actually creates cqs. Try to apply this
> patch which makes create_cq_compat() *not* creates the cqs and return an
> error instead :
> ========================================================================
> diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
> --- a/ompi/mca/btl/openib/btl_openib.c Fri May 28 14:50:25 2010 +0200
> +++ b/ompi/mca/btl/openib/btl_openib.c Wed Jun 02 10:56:57 2010 +0200
> @@ -146,6 +146,7 @@
> int cqe, void *cq_context, struct ibv_comp_channel *channel,
> int comp_vector)
> {
> + return OMPI_ERROR;
> #if OMPI_IBV_CREATE_CQ_ARGS == 3
> return ibv_create_cq(context, cqe, channel);
> #else
> ========================================================================
Don't you mean return NULL? This function is supposed to return a (struct
ibv_cq *).
> You should see MPI_Init complete nicely and your application segfault on
> the next MPI operation.
That wouldn't surprise me if you return OMPI_ERROR here, since it's expecting a
pointer return value (OMPI_ERROR != NULL, so the error check from
ibv_create_cq_compat() won't detect the problem properly).
Sidenote: why did we call it ibv_create_cq_compat()? That seems like a
namespace violation, and is quite confusing. :-\
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/