Paul,

i assume you ran the test with Open MPI configured with --disable-dlopen, right ?

--disable-dlopen is like forcing coll_ml to be loaded first, hence the crash, even with --mca coll ^ml

without --disable-dlopen, and with default coll_ml_priority=0, the crash only occurs if coll_ml is loaded before coll_hcoll.


Folks,

as far as i understand, the behavior depends on how plugins are enumerated and this is system dependent
(by default, Daniel got a crash, but i got none ...)
should we sort the plugins by name/library name so we do not fall into this kind of system dependent issues ?

Cheers,

Gilles

On 6/26/2015 12:31 PM, Paul Hargrove wrote:


On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:


    On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet
    <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

        In this case, mca_coll_hcoll module is linked with the
        proprietary libhcoll.so.
        the ml symbols are defined in both mca_coll_ml.so and libhcoll.so
        i am not sure (i blame my poor understanding of linkers) this
        is an error if
        Open MPI is configure'd with --disable-dlopen



    I will run the test now on a system w/ Mellanox's libhcoll and
    report what I find.



Gilles,

I had originally missed the fact that the conflicts were between Open MPI code and "vendor code". Otherwise I don't think I'd have put forward the --disable-dlopen suggestion.
However, as promised I tried the experiment.

I find that having both coll:ml and coll:hcoll in a --disable-dlopen build this does NOT result in failures linking libmpi nor in linking an MPI application. So, having Jenkins (for instance) testing in this way would not have exposed this problem.

To sure I was testing what I thought I was:

I did confirm that I get a SEGV running hello_c (from the examples subdir) unless I use "-mca coll ^hcoll".

I tried using "-mca coll ^ml" but still get a SEGV that appears to show coll:hcoll invoking functions in coll_ml_module.c, just as I do with no mca options at all.

Note I did this all with the released 1.8.6 tarball.

-Paul


--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/06/17542.php

Reply via email to