Ralph mx = Myricom (not Mellanox, which is mxm). So, there is probably nobody to fix anything specific to the MX support. Thus if this newly reported problem is (as I am going to guess) in config/ompi_check_mx.m4 then it may go unfixed. You say you and I are the only ones to care, and I think we both care for reasons related to software quality rather than any desire to use MX.
However, the LDFLAGS issues with the tests don't seem to be related to a specific network. Unfortunately, I am right now composing an email reporting that you and I arrived at the WRONG fix for that yesterday. -Paul On Mon, Aug 24, 2015 at 10:26 AM, Ralph Castain <r...@open-mpi.org> wrote: > You know, if it wren’t for the impact it would have on our users, I’d > almost say that if Mellanox doesn’t care enough to ensure this works, then > maybe we should just release and see if someone actually does care? > > I’ll try again later today if/when I have time. Otherwise, I’ll raise it > at tomorrow’s telecon and see if anyone cares enough to fix it. At the > moment, it appears only you and I do - and I’m not sure I care enough to > keep poking it :-) > > Thanks Paul! > Ralph > > On Aug 24, 2015, at 10:19 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > Sorry to yet again be the bearer of bad news. > > I am now configuring with > > --prefix=[...] --enable-debug --with-libfabric=/opt/libfabric-1.0.0 > --with-mx=/opt/mx2g --disable-dlopen > > This is like the previous configuration that caused problems, but with > "--disable-dlopen" instead of "--enable-static --disable-shared". > I seems that each time I try something new, something else breaks. > > The build finishes fine. > I can compile the examples fine. > But I once again see a failure to run an example: > > $ mpirun -mca btl sm,self -np 2 examples/ring_c' > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > examples/ring_c: error while loading shared libraries: libmyriexpress.so: > cannot open shared object file: No such file or directory > > ldd agrees: > > $ ldd examples/ring_c > linux-vdso.so.1 => (0x00007fff332f0000) > libmpi.so.12 => > /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libmpi.so.12 > (0x00007f1879305000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f18790d2000) > libc.so.6 => /lib64/libc.so.6 (0x00007f1878d3e000) > libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f1878b2c000) > libmyriexpress.so => not found > libfabric.so.1 => /opt/libfabric-1.0.0/lib/libfabric.so.1 > (0x00007f18788fe000) > libopen-rte.so.12 => > /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-rte.so.12 > (0x00007f1878565000) > libopen-pal.so.13 => > /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-pal.so.13 > (0x00007f1878241000) > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f1878036000) > libdl.so.2 => /lib64/libdl.so.2 (0x00007f1877e32000) > librt.so.1 => /lib64/librt.so.1 (0x00007f1877c29000) > libm.so.6 => /lib64/libm.so.6 (0x00007f18779a5000) > libutil.so.1 => /lib64/libutil.so.1 (0x00007f18777a2000) > /lib64/ld-linux-x86-64.so.2 (0x00007f1879a2a000) > libnl.so.1 => /lib64/libnl.so.1 (0x00007f187754f000) > > However, this time it looks like everything is linked correctly: > > $ mpicc --show examples/ring_c.c > gcc examples/ring_c.c > -I/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/include > -pthread -L/opt/mx2g/lib -L/opt/libfabric-1.0.0/lib -Wl,-rpath > -Wl,/opt/mx2g/lib -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath > -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > -Wl,--enable-new-dtags > -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > -lmpi > > $ chrpath --list examples/ring_c > examples/ring_c: RPATH=/opt/mx2g/lib > :/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > > > Looking a bit further I find that none of the MPI, OPAL or ORTE libs was > built with the MX libdir in its rpath, though MPI and OPAL have libfabric: > > $ chrpath --list INST/lib/libmpi.so > INST/lib/libmpi.so: > RPATH=/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > $ chrpath --list INST/lib/libopen-pal.so > INST/lib/libopen-pal.so: RPATH=::/opt/libfabric-1.1.0/lib > $ chrpath --list INST/lib/libopen-rte.so > INST/lib/libopen-rte.so: > RPATH=/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > > > Extracted from the "make V=1" output, here are the (shortened) link > commands for libmpi.so: > > /bin/sh ../libtool --tag=CC --mode=link gcc -std=gnu99 -g > -finline-functions -fno-strict-aliasing -pthread -version-info 12:0:0 -o > libmpi.la -rpath > /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > [.lo and .la files] -lrt -lm -lutil > > libtool: link: gcc -std=gnu99 -shared -fPIC -DPIC [.o and .a files] > -Wl,--no-whole-archive -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath > -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs > -Wl,-rpath > -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs > -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath > -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib > -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs > -L/opt/mx2g/lib -libverbs -lmyriexpress -L/opt/libfabric-1.0.0/lib > /opt/libfabric-1.0.0/lib/libfabric.so -lpthread > /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs/libopen-rte.so > /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs/libopen-pal.so > -lnuma -ldl -lrt -lm -lutil -pthread -pthread -Wl,-soname > -Wl,libmpi.so.12 -o .libs/libmpi.so.12.0.0 > > The appropriate "-L" and "-l" options are present for libmryiexpress, but > there is no corresponding "-Wl,-rpath, -Wl,...". > > In contrast, libfabric gets "-L" and "-Wl,-rpath, -Wl,...". > Curiously, libfabric.so gets linked by full path, instead of "-lfabric". > I am not sure if that difference is meaningful or not, but thought I would > mention it just in case it is. > > -Paul > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/08/17821.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/08/17822.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900