Vasily, The problem you've identified of differing kernel versions is exacerbated by also computing across hybrid, heterogeneous hardware architectures (i.e. SMP& NUMA, different streaming processor architectures, or different shared memory architectures).
========================== Kenneth A. Lloyd, Jr. CEO - Director, Systems Science Watt Systems Technologies Inc. Albuquerque, NM USA www.wattsys.com kenneth.ll...@wattsys.com This e-mail is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, and is intended only for the addressee named above. It may contain privileged or confidential information. If you are not the addressee you must not copy, distribute, disclose or use any of the information in this transmission. If you received it in error, please delete it and immediately notify the sender. -----Original Message----- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Vasily Filipov Sent: Monday, March 24, 2014 7:44 AM To: Open MPI Developers Subject: Re: [OMPI devel] autoconf warnings: openib BTL Actually I think if you build your job with one kernel version and run it on nodes that have another version so rdmacm will be the smallest your problem. Anyway, here is the revision fixes the issue. ------------------------------------------------------------------------ r31194 | vasily | 2014-03-24 15:36:04 +0200 (Mon, 24 Mar 2014) | 3 lines BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by lib rdmacm during component_init. ------------------------------------------------------------------------ Thank you, Vasily. On 13-Mar-14 15:44, Ralph Castain wrote: > I think the critical point is this one: > >> To be clear: whether AF_IB works or not is a determination to make on the machines on which you *run* -- NOT on the machine on which you *build*. > Many of our users compile on the frontend node of their cluster, which doesn't even have an IB NIC installed (they only have the libraries present so it can compile). You need to test this at run time to ensure you are on a machine where someone actually is able to run rdmacm. > > > On Mar 13, 2014, at 5:53 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > >> On Mar 13, 2014, at 4:59 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: >> >>>>>> Right? If so, I don't see why you need the AC_TRY_RUN -- if RDMACM is easily detectable as to which way it is compiled (because it has, for example, different fields), then AC_CHECK_DECLS should be enough, right? >>> RDMACM API has different implementation requirements for its providers: tcp, af_ib (different structs/fields should be used/passed. different APIs/hooks should be called for bring-up). >> Yes, this was said before. Which is why I don't understand why AC_CHECK_DECLS isn't enough -- it's a compile-time check, right? >> >> Let me get this straight: >> >> 1. AF_IB may or may not be present. >> 2. If AF_IB is present, it may or may not work (i.e., support for AF_IB may or may not work in the kernel). >> 3. If AF_IB is present, you can only compile with the AF_IB fields and methods. >> 4. If AF_IB is not present, you can only compile with the non-AF_IB fields and methods. >> >> I think #2 is not relevant for configure -- only #1, #3, and #4 are relevant. So you should have code something like this: >> >> #if HAVE_DECL_AF_IB >> ret = do_the_stuff_with_af_ib(...); >> if (OMPI_SUCCESS != ret) { >> opal_show_help(...AF_IB doesn't work...); >> return ret; >> } >> #else >> ret = do_the_stuff_without_af_ib(...); >> if (OMPI_SUCCESS != ret) { >> opal_show_help(...non-AF_IB doesn't work...); >> return ret; >> } >> #endif >> >> To be clear: whether AF_IB works or not is a determination to make on the machines on which you *run* -- NOT on the machine on which you *build*. >> >> This is one of the key reasons that OMPI prefers run-time detection for run-time characteristics over configure-time detection for run-time characteristics (because you may run OMPI on different machines than where you built OMPI). >> >>> Currently, the RDMACM provider can be selected at compile time only and mpirun becomes incompatible to other RDMACM providers. >> What does mpirun have to do with this? We're talking about the openib BTL, right? >> >>> AC_TRY_RUN is a protection that selected provider will be able to run,otherwise no fallback to other provider will be available for user at runtime. >> I can't parse this statement...? >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14342.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14343.php > _______________________________________________ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14381.php ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4336 / Virus Database: 3722/7238 - Release Date: 03/23/14