I think the critical point is this one:

> To be clear: whether AF_IB works or not is a determination to make on the 
> machines on which you *run* -- NOT on the machine on which you *build*.

Many of our users compile on the frontend node of their cluster, which doesn't 
even have an IB NIC installed (they only have the libraries present so it can 
compile). You need to test this at run time to ensure you are on a machine 
where someone actually is able to run rdmacm.


On Mar 13, 2014, at 5:53 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> On Mar 13, 2014, at 4:59 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
> 
>>>>> Right?  If so, I don't see why you need the AC_TRY_RUN -- if RDMACM is 
>>>>> easily detectable as to which way it is compiled (because it has, for 
>>>>> example, different fields), then AC_CHECK_DECLS should be enough, right?
>> 
>> RDMACM API has different implementation requirements for its providers: tcp, 
>> af_ib (different structs/fields should be used/passed. different APIs/hooks 
>> should be called for bring-up).
> 
> Yes, this was said before.  Which is why I don't understand why 
> AC_CHECK_DECLS isn't enough -- it's a compile-time check, right?
> 
> Let me get this straight:
> 
> 1. AF_IB may or may not be present.
> 2. If AF_IB is present, it may or may not work (i.e., support for AF_IB may 
> or may not work in the kernel).
> 3. If AF_IB is present, you can only compile with the AF_IB fields and 
> methods.
> 4. If AF_IB is not present, you can only compile with the non-AF_IB fields 
> and methods.
> 
> I think #2 is not relevant for configure -- only #1, #3, and #4 are relevant. 
>  So you should have code something like this:
> 
> #if HAVE_DECL_AF_IB
>    ret = do_the_stuff_with_af_ib(...);
>    if (OMPI_SUCCESS != ret) {
>        opal_show_help(...AF_IB doesn't work...);
>        return ret;
>    }
> #else
>    ret = do_the_stuff_without_af_ib(...);
>    if (OMPI_SUCCESS != ret) {
>        opal_show_help(...non-AF_IB doesn't work...);
>        return ret;
>    }
> #endif
> 
> To be clear: whether AF_IB works or not is a determination to make on the 
> machines on which you *run* -- NOT on the machine on which you *build*.
> 
> This is one of the key reasons that OMPI prefers run-time detection for 
> run-time characteristics over configure-time detection for run-time 
> characteristics (because you may run OMPI on different machines than where 
> you built OMPI).
> 
>> Currently, the RDMACM provider can be selected at compile time only and 
>> mpirun becomes incompatible to other RDMACM providers.
> 
> What does mpirun have to do with this?  We're talking about the openib BTL, 
> right?
> 
>> AC_TRY_RUN is a protection that selected provider will be able to 
>> run,otherwise no fallback to other provider will be available for user at 
>> runtime.
> 
> I can't parse this statement...?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/03/14342.php

Reply via email to