Sorry to yet again be the bearer of bad news.

I am now configuring with

--prefix=[...] --enable-debug --with-libfabric=/opt/libfabric-1.0.0
--with-mx=/opt/mx2g --disable-dlopen

This is like the previous configuration that caused problems, but with
"--disable-dlopen" instead of "--enable-static --disable-shared".
I seems that each time I try something new, something else breaks.

The build finishes fine.
I can compile the examples fine.
But I once again see a failure to run an example:

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
examples/ring_c: error while loading shared libraries: libmyriexpress.so:
cannot open shared object file: No such file or directory

ldd agrees:

$ ldd examples/ring_c
        linux-vdso.so.1 =>  (0x00007fff332f0000)
        libmpi.so.12 =>
/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libmpi.so.12
(0x00007f1879305000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f18790d2000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1878d3e000)
        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f1878b2c000)
        libmyriexpress.so => not found
        libfabric.so.1 => /opt/libfabric-1.0.0/lib/libfabric.so.1
(0x00007f18788fe000)
        libopen-rte.so.12 =>
/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-rte.so.12
(0x00007f1878565000)
        libopen-pal.so.13 =>
/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-pal.so.13
(0x00007f1878241000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f1878036000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1877e32000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f1877c29000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f18779a5000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f18777a2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1879a2a000)
        libnl.so.1 => /lib64/libnl.so.1 (0x00007f187754f000)

However, this time it looks like everything is linked correctly:

$ mpicc --show examples/ring_c.c
gcc examples/ring_c.c
-I/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/include
-pthread -L/opt/mx2g/lib -L/opt/libfabric-1.0.0/lib -Wl,-rpath
-Wl,/opt/mx2g/lib -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath
-Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
-Wl,--enable-new-dtags
-L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
-lmpi

$ chrpath --list examples/ring_c
examples/ring_c: RPATH=/opt/mx2g/lib
:/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib


Looking a bit further I find that none of the MPI, OPAL or ORTE libs was
built with the MX libdir in its rpath, though MPI and OPAL have libfabric:

$ chrpath --list INST/lib/libmpi.so
INST/lib/libmpi.so:
RPATH=/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
$ chrpath --list INST/lib/libopen-pal.so
INST/lib/libopen-pal.so: RPATH=::/opt/libfabric-1.1.0/lib
$ chrpath --list INST/lib/libopen-rte.so
INST/lib/libopen-rte.so:
RPATH=/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib


Extracted from the "make V=1" output, here are the (shortened) link
commands for libmpi.so:

/bin/sh ../libtool  --tag=CC   --mode=link gcc -std=gnu99  -g
-finline-functions -fno-strict-aliasing -pthread -version-info 12:0:0   -o
libmpi.la -rpath
/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
 [.lo and .la files] -lrt -lm -lutil

libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  [.o and .a files]
-Wl,--no-whole-archive  -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath
-Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs
-Wl,-rpath
-Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
-Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath
-Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
-L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
-L/opt/mx2g/lib -libverbs -lmyriexpress -L/opt/libfabric-1.0.0/lib
/opt/libfabric-1.0.0/lib/libfabric.so -lpthread
/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs/libopen-rte.so
/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs/libopen-pal.so
-lnuma -ldl -lrt -lm -lutil  -pthread   -pthread -Wl,-soname
-Wl,libmpi.so.12 -o .libs/libmpi.so.12.0.0

The appropriate "-L" and "-l" options are present for libmryiexpress, but
there is no corresponding "-Wl,-rpath, -Wl,...".

In contrast, libfabric gets  "-L" and "-Wl,-rpath, -Wl,...".
Curiously, libfabric.so gets linked by full path, instead of "-lfabric".
I am not sure if that difference is meaningful or not, but thought I would
mention it just in case it is.

-Paul

-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to