Fixed this issue in HCOLL by renaming conflicting symbols.  Repro case is 
working fine after this.

also explored –Bsymbolic linker option, but it seems not safe to do.

-Devendar

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, June 25, 2015 9:31 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when 
coll ml not disabled

Crud - thanks Paul! Mellanox is working on a fix (renaming the symbols in their 
proprietary library so they don't conflict). If they can release that soon, I'm 
hoping to avoid having to release a quick 1.8.7 to fix the problem from inside 
OMPI (i.e., removing one of the conflicting plugins).



On Thu, Jun 25, 2015 at 8:31 PM, Paul Hargrove 
<phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>> wrote:


On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove 
<phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>> wrote:

On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet 
<gil...@rist.or.jp<mailto:gil...@rist.or.jp>> wrote:
In this case, mca_coll_hcoll module is linked with the proprietary libhcoll.so.
the ml symbols are defined in both mca_coll_ml.so and libhcoll.so
i am not sure (i blame my poor understanding of linkers) this is an error if
Open MPI is configure'd with --disable-dlopen


I will run the test now on a system w/ Mellanox's libhcoll and report what I 
find.


Gilles,

I had originally missed the fact that the conflicts were between Open MPI code 
and "vendor code".
Otherwise I don't think I'd have put forward the --disable-dlopen suggestion.
However, as promised I tried the experiment.

I find that having both coll:ml and coll:hcoll in a --disable-dlopen build this 
does NOT result in failures linking libmpi nor in linking an MPI application.  
So, having Jenkins (for instance) testing in this way would not have exposed 
this problem.

To sure I was testing what I thought I was:

I did confirm that I get a SEGV running hello_c (from the examples subdir) 
unless I use "-mca coll ^hcoll".

I tried using "-mca coll ^ml" but still get a SEGV that appears to show 
coll:hcoll invoking functions in coll_ml_module.c, just as I do with no mca 
options at all.

Note I did this all with the released 1.8.6 tarball.

-Paul


--
Paul H. Hargrove                          
phhargr...@lbl.gov<mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: 
+1-510-495-2352<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory     Fax: 
+1-510-486-6900<tel:%2B1-510-486-6900>

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/06/17542.php

Reply via email to