For sanity sake I also checked the LD_LIBRARY_PATH, doesn't seem to be anything suspicious there either...

login3% echo $LD_LIBRARY_PATH
/opt/apps/pgi/7.1/linux86-64/7.1-2/libso:/opt/gsi-openssh-4.1/lib:/opt/gsi-openssh-4.1/lib:/opt/apps/binutils-amd/070220/lib64

I am trying Jeff's suggestion to replace OMPI_COMPILE_IFELSE to OMPI_LINK_IFELSE. Will let you know.

Pak Lui wrote:
Jeff Squyres wrote:
Jon / Steve -- can you comment?

I tested with OFED 1.2.5 (which is what I assume you meant) and got:

checking for rdma_get_peer_addr... no

Because that function is not defined in OFED 1.2.5. Running with OFED 1.3 (where the function does exist), I get:

checking for rdma_get_peer_addr... yes

For me it seems to be running with 1.2.5.

login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5

No rmda_get_peer_addr or rmda_get_local_addr in these .so's, assumingly they are coming from there.

login3% ls librdmacm.so*
librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2

login3% nm librdmacm.so* | grep rdma_get_
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices
0000000000003470 T rdma_get_cm_event
0000000000001a20 T rdma_get_devices

And I don't see rdma_get_peer_addr appeared in the /opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually know about the interface (and it's not inline) there.

Outside of all the configure complexity, can you write a simple program that calls that function and have it compile and link properly?

These are the references of rmda_get_peer_addr from the config.log:
   47858 configure:120941: checking for rdma_get_peer_addr
47859 configure:120966: pgcc -c -g -D_REENTRANT -I/opt/ofed/include conftest.c >&5 47860 PGC-W-0155-Pointer value created from a nonlong integral type (conftest .c: 412)
   47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   47862 configure:120972: $? = 0
   47863 configure:120987: result: yes
...
   48355 configure:123600: checking for rdma_get_peer_addr
48356 configure:123625: pgcc -c -g -D_REENTRANT -I/opt/ofed/include conftes t.c >&5 48357 PGC-W-0155-Pointer value created from a nonlong integral type (conftest .c: 423)
   48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   48359 configure:123631: $? = 0
   48360 configure:123646: result: yes

Here's my program, not sure if it's doing it correctly. I am no m4 expert, so how do I run the ompi_check_openib.m4 independently and see the conftest.c??

login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
     void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
     return 0;
}

It gives me a warning if I just try to create an object, which is what I see in the config.log.

login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0

But trying to create an executable would give me the error.

login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3: undefined reference to `rdma_get_peer_addr'

Hmm, any clues, comments?

I suppose we could change the AC_COMPILE_IFELSE in config/ ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as to why it would compile successfully if the symbol rdma_get_peer_addr is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5, AFAIK)...



On May 3, 2008, at 10:56 AM, Pak Lui wrote:

Sure Jeff, see attached.

Jeff Squyres wrote:
(moving to devel so that others are aware)
Crud. Can you send me your config.log? I don't know why it's able to find rdma_get_peer_addr() in configure, but then later not able to find it during the build - I'd like to see what happened during configure.
On May 2, 2008, at 7:09 PM, Pak Lui wrote:
Hi Jeff,

It seems that the cpc3 merge causes my Ranger build to break. I believe it is using OFED 1.2 but I don't know how to check. It passes the ompi_check_openib.m4 that you added in for the rdma_get_peer_addr. Is there a missing #include for openib/ofed related somewhere?


 1236 checking rdma/rdma_cma.h usability... yes
 1237 checking rdma/rdma_cma.h presence... yes
 1238 checking for rdma/rdma_cma.h... yes
 1239 checking for rdma_create_id in -lrdmacm... yes
 1240 checking for rdma_get_peer_addr... yes


pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa - DOMPI_CONFIGURE_USER="\"paklui\"" - DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" - DOMPI_CONFIGURE_DATE="\"Fri May 2 17:07:01 CDT 2008\"" - DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" - DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG \"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. - I../../../../ opal/include -I../../../../orte/include - I../../../../ompi/include - D_REENTRANT\"" - DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG \"" - DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. -I../../../../ opal/include -I../../../../orte/include -I../../../../ompi/ include - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" - DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" - DOMPI_BUILD_LIBS="\"-lnsl -lutil -lpthread\"" - DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc \"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/ pgCC\"" -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/ bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/ linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small\"" - I../../../.. - I../../.. -I../../../../opal/include -I../../../../ orte/include - I../../../../ompi/include -D_REENTRANT -O - DNDEBUG -c -o version.o ../../../../ompi/tools/ompi_info/ version.cc /bin/sh ../../../libtool --tag=CXX --mode=link pgCC -O -DNDEBUG - o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil -lpthread libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/.libs/ libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt /share/ home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen- rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/ opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread - Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared- install1/lib

[1] Exit 2 make install >& make.install.log.0 ../../../ompi/.libs/libmpi.so: undefined reference to `rdma_get_peer_addr' ../../../ompi/.libs/libmpi.so: undefined reference to `rdma_get_local_addr'
make[2]: *** [ompi_info] Error 2
make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/ config-data1/ompi/tools/ompi_info'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/ config-data1/ompi'
make: *** [install-recursive] Error 1




--

- Pak Lui
pak....@sun.com
--


- Pak Lui
pak....@sun.com
<config.log.bz2><mime-attachment.txt>





--


- Pak Lui
pak....@sun.com

Reply via email to