Ralph

mx = Myricom (not Mellanox, which is mxm).
So,  there is probably nobody to fix anything specific to the MX support.
Thus if this newly reported problem is (as I am going to guess)
in config/ompi_check_mx.m4 then it may go unfixed.
You say you and I are the only ones to care, and I think we both care for
reasons related to software quality rather than any desire to use MX.

However, the LDFLAGS issues with the tests don't seem to be related to a
specific network.
Unfortunately, I am right now composing an email reporting that you and I
arrived at the WRONG fix for that yesterday.

-Paul

On Mon, Aug 24, 2015 at 10:26 AM, Ralph Castain <r...@open-mpi.org> wrote:

> You know, if it wren’t for the impact it would have on our users, I’d
> almost say that if Mellanox doesn’t care enough to ensure this works, then
> maybe we should just release and see if someone actually does care?
>
> I’ll try again later today if/when I have time. Otherwise, I’ll raise it
> at tomorrow’s telecon and see if anyone cares enough to fix it. At the
> moment, it appears only you and I do - and I’m not sure I care enough to
> keep poking it :-)
>
> Thanks Paul!
> Ralph
>
> On Aug 24, 2015, at 10:19 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> Sorry to yet again be the bearer of bad news.
>
> I am now configuring with
>
> --prefix=[...] --enable-debug --with-libfabric=/opt/libfabric-1.0.0
> --with-mx=/opt/mx2g --disable-dlopen
>
> This is like the previous configuration that caused problems, but with
> "--disable-dlopen" instead of "--enable-static --disable-shared".
> I seems that each time I try something new, something else breaks.
>
> The build finishes fine.
> I can compile the examples fine.
> But I once again see a failure to run an example:
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> examples/ring_c: error while loading shared libraries: libmyriexpress.so:
> cannot open shared object file: No such file or directory
>
> ldd agrees:
>
> $ ldd examples/ring_c
>         linux-vdso.so.1 =>  (0x00007fff332f0000)
>         libmpi.so.12 =>
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libmpi.so.12
> (0x00007f1879305000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f18790d2000)
>         libc.so.6 => /lib64/libc.so.6 (0x00007f1878d3e000)
>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f1878b2c000)
>         libmyriexpress.so => not found
>         libfabric.so.1 => /opt/libfabric-1.0.0/lib/libfabric.so.1
> (0x00007f18788fe000)
>         libopen-rte.so.12 =>
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-rte.so.12
> (0x00007f1878565000)
>         libopen-pal.so.13 =>
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-pal.so.13
> (0x00007f1878241000)
>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f1878036000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f1877e32000)
>         librt.so.1 => /lib64/librt.so.1 (0x00007f1877c29000)
>         libm.so.6 => /lib64/libm.so.6 (0x00007f18779a5000)
>         libutil.so.1 => /lib64/libutil.so.1 (0x00007f18777a2000)
>         /lib64/ld-linux-x86-64.so.2 (0x00007f1879a2a000)
>         libnl.so.1 => /lib64/libnl.so.1 (0x00007f187754f000)
>
> However, this time it looks like everything is linked correctly:
>
> $ mpicc --show examples/ring_c.c
> gcc examples/ring_c.c
> -I/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/include
> -pthread -L/opt/mx2g/lib -L/opt/libfabric-1.0.0/lib -Wl,-rpath
> -Wl,/opt/mx2g/lib -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> -Wl,--enable-new-dtags
> -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> -lmpi
>
> $ chrpath --list examples/ring_c
> examples/ring_c: RPATH=/opt/mx2g/lib
> :/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
>
>
> Looking a bit further I find that none of the MPI, OPAL or ORTE libs was
> built with the MX libdir in its rpath, though MPI and OPAL have libfabric:
>
> $ chrpath --list INST/lib/libmpi.so
> INST/lib/libmpi.so:
> RPATH=/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> $ chrpath --list INST/lib/libopen-pal.so
> INST/lib/libopen-pal.so: RPATH=::/opt/libfabric-1.1.0/lib
> $ chrpath --list INST/lib/libopen-rte.so
> INST/lib/libopen-rte.so:
> RPATH=/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
>
>
> Extracted from the "make V=1" output, here are the (shortened) link
> commands for libmpi.so:
>
> /bin/sh ../libtool  --tag=CC   --mode=link gcc -std=gnu99  -g
> -finline-functions -fno-strict-aliasing -pthread -version-info 12:0:0   -o
> libmpi.la -rpath
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
>  [.lo and .la files] -lrt -lm -lutil
>
> libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  [.o and .a files]
> -Wl,--no-whole-archive  -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs
> -Wl,-rpath
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
> -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
> -L/opt/mx2g/lib -libverbs -lmyriexpress -L/opt/libfabric-1.0.0/lib
> /opt/libfabric-1.0.0/lib/libfabric.so -lpthread
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs/libopen-rte.so
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs/libopen-pal.so
> -lnuma -ldl -lrt -lm -lutil  -pthread   -pthread -Wl,-soname
> -Wl,libmpi.so.12 -o .libs/libmpi.so.12.0.0
>
> The appropriate "-L" and "-l" options are present for libmryiexpress, but
> there is no corresponding "-Wl,-rpath, -Wl,...".
>
> In contrast, libfabric gets  "-L" and "-Wl,-rpath, -Wl,...".
> Curiously, libfabric.so gets linked by full path, instead of "-lfabric".
> I am not sure if that difference is meaningful or not, but thought I would
> mention it just in case it is.
>
> -Paul
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/08/17821.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/08/17822.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to