Fixed this issue in HCOLL by renaming conflicting symbols. Repro case is working fine after this.
also explored –Bsymbolic linker option, but it seems not safe to do. -Devendar From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, June 25, 2015 9:31 PM To: Open MPI Developers Subject: Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled Crud - thanks Paul! Mellanox is working on a fix (renaming the symbols in their proprietary library so they don't conflict). If they can release that soon, I'm hoping to avoid having to release a quick 1.8.7 to fix the problem from inside OMPI (i.e., removing one of the conflicting plugins). On Thu, Jun 25, 2015 at 8:31 PM, Paul Hargrove <phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>> wrote: On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove <phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>> wrote: On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet <gil...@rist.or.jp<mailto:gil...@rist.or.jp>> wrote: In this case, mca_coll_hcoll module is linked with the proprietary libhcoll.so. the ml symbols are defined in both mca_coll_ml.so and libhcoll.so i am not sure (i blame my poor understanding of linkers) this is an error if Open MPI is configure'd with --disable-dlopen I will run the test now on a system w/ Mellanox's libhcoll and report what I find. Gilles, I had originally missed the fact that the conflicts were between Open MPI code and "vendor code". Otherwise I don't think I'd have put forward the --disable-dlopen suggestion. However, as promised I tried the experiment. I find that having both coll:ml and coll:hcoll in a --disable-dlopen build this does NOT result in failures linking libmpi nor in linking an MPI application. So, having Jenkins (for instance) testing in this way would not have exposed this problem. To sure I was testing what I thought I was: I did confirm that I get a SEGV running hello_c (from the examples subdir) unless I use "-mca coll ^hcoll". I tried using "-mca coll ^ml" but still get a SEGV that appears to show coll:hcoll invoking functions in coll_ml_module.c, just as I do with no mca options at all. Note I did this all with the released 1.8.6 tarball. -Paul -- Paul H. Hargrove phhargr...@lbl.gov<mailto:phhargr...@lbl.gov> Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352<tel:%2B1-510-495-2352> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900<tel:%2B1-510-486-6900> _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2015/06/17542.php