You know, if it wren’t for the impact it would have on our users, I’d almost 
say that if Mellanox doesn’t care enough to ensure this works, then maybe we 
should just release and see if someone actually does care?

I’ll try again later today if/when I have time. Otherwise, I’ll raise it at 
tomorrow’s telecon and see if anyone cares enough to fix it. At the moment, it 
appears only you and I do - and I’m not sure I care enough to keep poking it :-)

Thanks Paul!
Ralph

> On Aug 24, 2015, at 10:19 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> Sorry to yet again be the bearer of bad news.
> 
> I am now configuring with
> --prefix=[...] --enable-debug --with-libfabric=/opt/libfabric-1.0.0 
> --with-mx=/opt/mx2g --disable-dlopen
> This is like the previous configuration that caused problems, but with 
> "--disable-dlopen" instead of "--enable-static --disable-shared".
> I seems that each time I try something new, something else breaks.
> 
> The build finishes fine.
> I can compile the examples fine.
> But I once again see a failure to run an example:
> 
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> examples/ring_c: error while loading shared libraries: libmyriexpress.so: 
> cannot open shared object file: No such file or directory
> 
> ldd agrees:
> 
> $ ldd examples/ring_c
>         linux-vdso.so.1 =>  (0x00007fff332f0000)
>         libmpi.so.12 => 
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libmpi.so.12
>  (0x00007f1879305000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f18790d2000)
>         libc.so.6 => /lib64/libc.so.6 (0x00007f1878d3e000)
>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f1878b2c000)
>         libmyriexpress.so => not found
>         libfabric.so.1 => /opt/libfabric-1.0.0/lib/libfabric.so.1 
> (0x00007f18788fe000)
>         libopen-rte.so.12 => 
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-rte.so.12
>  (0x00007f1878565000)
>         libopen-pal.so.13 => 
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib/libopen-pal.so.13
>  (0x00007f1878241000)
>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f1878036000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f1877e32000)
>         librt.so.1 => /lib64/librt.so.1 (0x00007f1877c29000)
>         libm.so.6 => /lib64/libm.so.6 (0x00007f18779a5000)
>         libutil.so.1 => /lib64/libutil.so.1 (0x00007f18777a2000)
>         /lib64/ld-linux-x86-64.so.2 (0x00007f1879a2a000)
>         libnl.so.1 => /lib64/libnl.so.1 (0x00007f187754f000)
> 
> However, this time it looks like everything is linked correctly:
> 
> $ mpicc --show examples/ring_c.c
> gcc examples/ring_c.c 
> -I/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/include
>  -pthread -L/opt/mx2g/lib -L/opt/libfabric-1.0.0/lib -Wl,-rpath 
> -Wl,/opt/mx2g/lib -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath 
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib 
> -Wl,--enable-new-dtags 
> -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib 
> -lmpi
> 
> $ chrpath --list examples/ring_c
> examples/ring_c: 
> RPATH=/opt/mx2g/lib:/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> 
> 
> Looking a bit further I find that none of the MPI, OPAL or ORTE libs was 
> built with the MX libdir in its rpath, though MPI and OPAL have libfabric:
> 
> $ chrpath --list INST/lib/libmpi.so      
> INST/lib/libmpi.so: 
> RPATH=/opt/libfabric-1.0.0/lib:/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> $ chrpath --list INST/lib/libopen-pal.so 
> INST/lib/libopen-pal.so: RPATH=::/opt/libfabric-1.1.0/lib
> $ chrpath --list INST/lib/libopen-rte.so  
> INST/lib/libopen-rte.so: 
> RPATH=/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib
> 
> 
> Extracted from the "make V=1" output, here are the (shortened) link commands 
> for libmpi.so:
> 
> /bin/sh ../libtool  --tag=CC   --mode=link gcc -std=gnu99  -g 
> -finline-functions -fno-strict-aliasing -pthread -version-info 12:0:0   -o 
> libmpi.la <http://libmpi.la/> -rpath 
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib  
> [.lo and .la files] -lrt -lm -lutil
> 
> libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  [.o and .a files] 
> -Wl,--no-whole-archive  -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath 
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs
>  -Wl,-rpath 
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
>  -Wl,-rpath -Wl,/opt/libfabric-1.0.0/lib -Wl,-rpath 
> -Wl,/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/INST/lib 
> -L/scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs
>  -L/opt/mx2g/lib -libverbs -lmyriexpress -L/opt/libfabric-1.0.0/lib 
> /opt/libfabric-1.0.0/lib/libfabric.so -lpthread 
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/orte/.libs/libopen-rte.so
>  
> /scratch/phargrov/OMPI/openmpi-1.10.0rc6-linux-x86_64-no-dlopen/BLD/opal/.libs/libopen-pal.so
>  -lnuma -ldl -lrt -lm -lutil  -pthread   -pthread -Wl,-soname 
> -Wl,libmpi.so.12 -o .libs/libmpi.so.12.0.0
> 
> The appropriate "-L" and "-l" options are present for libmryiexpress, but 
> there is no corresponding "-Wl,-rpath, -Wl,...".
> 
> In contrast, libfabric gets  "-L" and "-Wl,-rpath, -Wl,...".
> Curiously, libfabric.so gets linked by full path, instead of "-lfabric".
> I am not sure if that difference is meaningful or not, but thought I would 
> mention it just in case it is.
> 
> -Paul
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/08/17821.php

Reply via email to