On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> > On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > >> In this case, mca_coll_hcoll module is linked with the proprietary >> libhcoll.so. >> the ml symbols are defined in both mca_coll_ml.so and libhcoll.so >> i am not sure (i blame my poor understanding of linkers) this is an error >> if >> Open MPI is configure'd with --disable-dlopen >> > > > I will run the test now on a system w/ Mellanox's libhcoll and report what > I find. > Gilles, I had originally missed the fact that the conflicts were between Open MPI code and "vendor code". Otherwise I don't think I'd have put forward the --disable-dlopen suggestion. However, as promised I tried the experiment. I find that having both coll:ml and coll:hcoll in a --disable-dlopen build this does NOT result in failures linking libmpi nor in linking an MPI application. So, having Jenkins (for instance) testing in this way would not have exposed this problem. To sure I was testing what I thought I was: I did confirm that I get a SEGV running hello_c (from the examples subdir) unless I use "-mca coll ^hcoll". I tried using "-mca coll ^ml" but still get a SEGV that appears to show coll:hcoll invoking functions in coll_ml_module.c, just as I do with no mca options at all. Note I did this all with the released 1.8.6 tarball. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900