Paul,
i assume you ran the test with Open MPI configured with
--disable-dlopen, right ?
--disable-dlopen is like forcing coll_ml to be loaded first, hence the
crash, even with --mca coll ^ml
without --disable-dlopen, and with default coll_ml_priority=0, the crash
only occurs if coll_ml is loaded before coll_hcoll.
Folks,
as far as i understand, the behavior depends on how plugins are
enumerated and this is system dependent
(by default, Daniel got a crash, but i got none ...)
should we sort the plugins by name/library name so we do not fall into
this kind of system dependent issues ?
Cheers,
Gilles
On 6/26/2015 12:31 PM, Paul Hargrove wrote:
On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove <phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>> wrote:
On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
In this case, mca_coll_hcoll module is linked with the
proprietary libhcoll.so.
the ml symbols are defined in both mca_coll_ml.so and libhcoll.so
i am not sure (i blame my poor understanding of linkers) this
is an error if
Open MPI is configure'd with --disable-dlopen
I will run the test now on a system w/ Mellanox's libhcoll and
report what I find.
Gilles,
I had originally missed the fact that the conflicts were between Open
MPI code and "vendor code".
Otherwise I don't think I'd have put forward the --disable-dlopen
suggestion.
However, as promised I tried the experiment.
I find that having both coll:ml and coll:hcoll in a --disable-dlopen
build this does NOT result in failures linking libmpi nor in linking
an MPI application. So, having Jenkins (for instance) testing in this
way would not have exposed this problem.
To sure I was testing what I thought I was:
I did confirm that I get a SEGV running hello_c (from the examples
subdir) unless I use "-mca coll ^hcoll".
I tried using "-mca coll ^ml" but still get a SEGV that appears to
show coll:hcoll invoking functions in coll_ml_module.c, just as I do
with no mca options at all.
Note I did this all with the released 1.8.6 tarball.
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/06/17542.php