Re: [OMPI devel] Build failure on FreeBSD 7

2008-05-04 Thread Jeff Squyres

On May 3, 2008, at 7:32 PM, Brad Penoff wrote:


The small commit that Karol originally suggested was just pushed to
ompi-trunk.  This just simply adds the appropriate header files for
FreeBSD (6.2, 6.3 and 7) to be able to compile.


Good.


This didn't fix the hanging on the kevent call mentioned in this
thread, however, setting the environment variable EVENT_NOKQUEUE did
find a work-around.  I'm not sure if that is the solution we want for
all FreeBSD platforms in the long term (requiring the user to set
particular environment variables for particular platforms), but for
now at least I can run the MTT tests that I need to (once it gets in a
nightly build).


Unfortunately, I think you're the only one who cares about FreeBSD, so  
it's likely going to be up to you to get it working.  :-\  I'm not  
being snide; I'm just saying that it's likely that no one else cares  
about FreeBSD, so no one else will spend cycles on a fix for it -- the  
only thing that people will care about is how the fix affects the rest  
of the code base.


I agree that making people setenv EVENT_NOKQUEUE before running on  
FreeBSD is not desirable.  I'm not too much of a fan of your patch,  
though -- is there a better way?  E.g., can you extend the test in  
ompi/config/ompi_setup_libevent.c to reliably detect whether kevent  
works on FreeBSD or not?  I'm assuming that the test should return  
"no, kevent is not supported" on FreeBSD, as opposed to the "yes, it  
works" that it must be returning today.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] MCA component open

2008-05-04 Thread Jeff Squyres
FWIW, Josh implemented "MCA-NULL" in https://svn.open-mpi.org/trac/ompi/changeset/18364 
. 

I'm not sure how I feel about this solution.  On the one hand, it's  
kind of a hack-ish way of solving the immediate issue.  On the other  
hand, it's really a larger issue of explicitly *not* setting an MCA  
param (or knowing what source an MCA value originated from, depending  
on how you look at it), something that we've never taken the time to  
address properly.  If we continue to not solve the larger issue, it's  
going to come up again someday and someone will add yet another  
workaround.


In both dimensions:

- I'm not entirely sure I understand the specific ORTE issue.  Is it  
that you want one "plm" MCA param value for mpirun and other value for  
other processes (i.e., the orteds)?  Or, more specifically, you want  
plm X in mpirun, and *no* PLM's in the orteds?


- Would adding an enum indicating where an MCA value was retrieved  
from help this situation?  E.g., MCA_PARAM_ENVIRONMENT,  
MCA_PARAM_FILE, MCA_PARAM_DEFAULT?



On May 3, 2008, at 12:02 PM, George Bosilca wrote:



The problem: The orted open all plm before discarding most of them,  
all this in the context where a "--mca plm rsh" was present on the  
mpirun invocation.


The non problem: In the context of the mpirun process, only the rsh  
plm is opened, as the mpirun is the only process who get the "--mca  
plm rsh" information. As this specific argument is not included on  
the list of arguments we forward to the orted processes, there is no  
way that the orted can abide to the imposed restriction. Note that  
if the restriction is inserted in the config file, then even the  
orted respect it. So far the only problem I can see here, is that  
the orted are opening a framework that they are not supposed to (at  
least not in most of the cases).


When we implemented the MCA filtering stuff, we proposed another  
optimization. More specifically, a default component for all special  
frameworks (i.e. used or not based on the type of process) that will  
be statically linked inside the library (and therefore will not  
generate any NFS traffic). Its only goal was to execute the  
selection logic when any of its functions were called, in other  
words on-demand component loading feature. Starting from there, a  
real component will be selected, and all other calls to this  
component will be directed to the selected component. I perfectly  
remember that Ralph was completely against this feature for two  
reasons: 1) all components in the ORTE framework had to be loaded  
and they will do the "if(!hnp) return NULL"; 2) he proposed to  
implement the null component.


I was and I'm still against 1) so I guess that any effort toward  
implementing a null or none component will have my support.


 george.

On May 2, 2008, at 4:40 PM, Josh Hursey wrote:

We could also call it 'null' for the empty set of components? Or  
maybe

OMPI-NULL.

Outside of the naming do others this this is a useful feature to
implement?

-- Josh

On May 2, 2008, at 10:51 AM, Ralph Castain wrote:


I would think that adding a special keyword would be the correct
method. I
would suggest something with an "ompi" in it, perhaps capitalized so
there
is no confusion...something like "OMPI-NONE"?


On 5/2/08 8:37 AM, "Josh Hursey"  wrote:

I don't believe we have the logic in place to tell  
mca_component_open

'do not open anything'. (I could be wrong though).

Adding such an option might be useful, but we would have to  
consider
how that option should be specified by the user. Currently if you  
do

not set a value (leave empty space in mca-params.conf) then the MCA
system takes this to indicate that all components are eligible for
selection. If you specify any options then only those options  
should

be opened. We could add a special keyword (such as 'none') to
indicate
'open nothing'.

What do people think about that?

-- Josh


On May 2, 2008, at 10:22 AM, Ralph Castain wrote:

I see what the problem is. In the case of slurm, I don't want - 
any-

components to be opened, even though I am going to call plm open/
select. I
have to leave that logic in place for those environments that -do-
want to
specify some backend secondary launcher.

So the question is: how do I tell mca_component_open "do not open
anything"?

If we don't have a mechanism for doing that, can we create one?


On 5/2/08 8:02 AM, "Ralph Castain"  wrote:

Well, I have a current version of the trunk. I add an MCA param  
to

the
environment indicating that only rsh is to be used by the orted.
Yet I get
an output from every orted indicating that slurm (misspelled!) is
available
for selection.

This tells me that the slurm component is being opened, even  
though

the
param is set.

I can check again to ensure that the param is set...


On 5/2/08 7:53 AM, "Jeff Squyres"  wrote:


(moving to devel list for wider audience)

Hmm.  I thought the UTK stuff from a while ago supposedly  
changed

this
behavi

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Jeff Squyres

Jon / Steve -- can you comment?

I tested with OFED 1.2.5 (which is what I assume you meant) and got:

checking for rdma_get_peer_addr... no

Because that function is not defined in OFED 1.2.5.  Running with OFED  
1.3 (where the function does exist), I get:


checking for rdma_get_peer_addr... yes

Outside of all the configure complexity, can you write a simple  
program that calls that function and have it compile and link properly?


I suppose we could change the AC_COMPILE_IFELSE in config/ 
ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as  
to why it would compile successfully if the symbol rdma_get_peer_addr  
is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,  
AFAIK)...




On May 3, 2008, at 10:56 AM, Pak Lui wrote:


Sure Jeff, see attached.

Jeff Squyres wrote:

(moving to devel so that others are aware)
Crud.  Can you send me your config.log?  I don't know why it's able  
to  find rdma_get_peer_addr() in configure, but then later not able  
to  find it during the build - I'd like to see what happened  
during  configure.

On May 2, 2008, at 7:09 PM, Pak Lui wrote:

Hi Jeff,

It seems that the cpc3 merge causes my Ranger build to break. I   
believe it is using OFED 1.2 but I don't know how to check. It   
passes the ompi_check_openib.m4 that you added in for the   
rdma_get_peer_addr. Is there a missing #include for openib/ofed   
related somewhere?



 1236 checking rdma/rdma_cma.h usability... yes
 1237 checking rdma/rdma_cma.h presence... yes
 1238 checking for rdma/rdma_cma.h... yes
 1239 checking for rdma_create_id in -lrdmacm... yes
 1240 checking for rdma_get_peer_addr... yes


pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info -  
I../../../opal/include -I../../../orte/include -I../../../ompi/  
include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -  
DOMPI_CONFIGURE_USER="\"paklui\"" -  
DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" -  
DOMPI_CONFIGURE_DATE="\"Fri May  2 17:07:01 CDT 2008\"" -  
DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" -  
DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG
\"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. - 
I../../../../ opal/include -I../../../../orte/include - 
I../../../../ompi/include  - D_REENTRANT\"" - 
DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG  \"" -  
DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. -I../../../../  
opal/include -I../../../../orte/include -I../../../../ompi/ 
include  - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" -  
DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" -  
DOMPI_BUILD_LIBS="\"-lnsl -lutil  -lpthread\"" -  
DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc 
\"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/ 
pgCC\""  -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/ 
bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/ 
linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small\"" - 
I../../../.. - I../../.. -I../../../../opal/include -I../../../../ 
orte/include - I../../../../ompi/include  -D_REENTRANT  -O - 
DNDEBUG   -c -o  version.o ../../../../ompi/tools/ompi_info/ 
version.cc
/bin/sh ../../../libtool --tag=CXX   --mode=link pgCC  -O -DNDEBUG  
- o ompi_info components.o ompi_info.o output.o param.o   
version.o ../../../ompi/libmpi.la -lnsl -lutil  -lpthread
libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o   
ompi_info.o output.o param.o version.o  ../../../ompi/.libs/  
libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt /share/  
home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen-  
rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/ 
opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread - 
Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared- 
install1/lib


[1]Exit 2make install >&   
make.install.log.0
../../../ompi/.libs/libmpi.so: undefined reference to   
`rdma_get_peer_addr'
../../../ompi/.libs/libmpi.so: undefined reference to   
`rdma_get_local_addr'

make[2]: *** [ompi_info] Error 2
make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/  
config-data1/ompi/tools/ompi_info'

make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/  
config-data1/ompi'

make: *** [install-recursive] Error 1




--

- Pak Lui
pak@sun.com



--


- Pak Lui
pak@sun.com




--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Steve Wise
This probably has to do with the fact that rdma_get_peer_addr() is a 
static inline in /usr/include/rdma/rdma_cma.h.  So if you don't include 
that file in the test program, then you won't get rdma_get_peer_addr() 
even if you link with librdmacm.so


Steve.



Jeff Squyres wrote:

Jon / Steve -- can you comment?

I tested with OFED 1.2.5 (which is what I assume you meant) and got:

checking for rdma_get_peer_addr... no

Because that function is not defined in OFED 1.2.5.  Running with OFED  
1.3 (where the function does exist), I get:


checking for rdma_get_peer_addr... yes

Outside of all the configure complexity, can you write a simple  
program that calls that function and have it compile and link properly?


I suppose we could change the AC_COMPILE_IFELSE in config/ 
ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as  
to why it would compile successfully if the symbol rdma_get_peer_addr  
is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,  
AFAIK)...




On May 3, 2008, at 10:56 AM, Pak Lui wrote:


Sure Jeff, see attached.

Jeff Squyres wrote:

(moving to devel so that others are aware)
Crud.  Can you send me your config.log?  I don't know why it's able  
to  find rdma_get_peer_addr() in configure, but then later not able  
to  find it during the build - I'd like to see what happened  
during  configure.

On May 2, 2008, at 7:09 PM, Pak Lui wrote:

Hi Jeff,

It seems that the cpc3 merge causes my Ranger build to break. I   
believe it is using OFED 1.2 but I don't know how to check. It   
passes the ompi_check_openib.m4 that you added in for the   
rdma_get_peer_addr. Is there a missing #include for openib/ofed   
related somewhere?



 1236 checking rdma/rdma_cma.h usability... yes
 1237 checking rdma/rdma_cma.h presence... yes
 1238 checking for rdma/rdma_cma.h... yes
 1239 checking for rdma_create_id in -lrdmacm... yes
 1240 checking for rdma_get_peer_addr... yes


pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info -  
I../../../opal/include -I../../../orte/include -I../../../ompi/  
include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -  
DOMPI_CONFIGURE_USER="\"paklui\"" -  
DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" -  
DOMPI_CONFIGURE_DATE="\"Fri May  2 17:07:01 CDT 2008\"" -  
DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" -  
DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG
\"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. - 
I../../../../ opal/include -I../../../../orte/include - 
I../../../../ompi/include  - D_REENTRANT\"" - 
DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG  \"" -  
DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. -I../../../../  
opal/include -I../../../../orte/include -I../../../../ompi/ 
include  - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" -  
DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" -  
DOMPI_BUILD_LIBS="\"-lnsl -lutil  -lpthread\"" -  
DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc 
\"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/ 
pgCC\""  -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/ 
bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/ 
linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small\"" - 
I../../../.. - I../../.. -I../../../../opal/include -I../../../../ 
orte/include - I../../../../ompi/include  -D_REENTRANT  -O - 
DNDEBUG   -c -o  version.o ../../../../ompi/tools/ompi_info/ 
version.cc
/bin/sh ../../../libtool --tag=CXX   --mode=link pgCC  -O -DNDEBUG  
- o ompi_info components.o ompi_info.o output.o param.o   
version.o ../../../ompi/libmpi.la -lnsl -lutil  -lpthread
libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o   
ompi_info.o output.o param.o version.o  ../../../ompi/.libs/  
libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt /share/  
home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen-  
rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/ 
opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread - 
Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared- 
install1/lib


[1]Exit 2make install >&   
make.install.log.0
../../../ompi/.libs/libmpi.so: undefined reference to   
`rdma_get_peer_addr'
../../../ompi/.libs/libmpi.so: undefined reference to   
`rdma_get_local_addr'

make[2]: *** [ompi_info] Error 2
make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/  
config-data1/ompi/tools/ompi_info'

make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/  
config-data1/ompi'

make: *** [install-recursive] Error 1




--

- Pak Lui
pak@sun.com


--


- Pak Lui
pak@sun.com






Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Pak Lui

Jeff Squyres wrote:

Jon / Steve -- can you comment?

I tested with OFED 1.2.5 (which is what I assume you meant) and got:

checking for rdma_get_peer_addr... no

Because that function is not defined in OFED 1.2.5.  Running with OFED  
1.3 (where the function does exist), I get:


checking for rdma_get_peer_addr... yes


For me it seems to be running with 1.2.5.

login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5

No rmda_get_peer_addr or rmda_get_local_addr in these .so's, assumingly 
they are coming from there.


login3% ls librdmacm.so*
librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2

login3% nm librdmacm.so* | grep rdma_get_
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices

And I don't see rdma_get_peer_addr appeared in the 
/opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually 
know about the interface (and it's not inline) there.




Outside of all the configure complexity, can you write a simple  
program that calls that function and have it compile and link properly?


These are the references of rmda_get_peer_addr from the config.log:
  47858 configure:120941: checking for rdma_get_peer_addr
  47859 configure:120966: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
  47860 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 412)

  47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
  47862 configure:120972: $? = 0
  47863 configure:120987: result: yes
...
  48355 configure:123600: checking for rdma_get_peer_addr
  48356 configure:123625: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
  48357 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 423)

  48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
  48359 configure:123631: $? = 0
  48360 configure:123646: result: yes

Here's my program, not sure if it's doing it correctly. I am no m4 
expert, so how do I run the ompi_check_openib.m4 independently and see 
the conftest.c??


login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
return 0;
}

It gives me a warning if I just try to create an object, which is what I 
see in the config.log.


login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0

But trying to create an executable would give me the error.

login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3: 
undefined reference to `rdma_get_peer_addr'


Hmm, any clues, comments?



I suppose we could change the AC_COMPILE_IFELSE in config/ 
ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as  
to why it would compile successfully if the symbol rdma_get_peer_addr  
is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,  
AFAIK)...




On May 3, 2008, at 10:56 AM, Pak Lui wrote:


Sure Jeff, see attached.

Jeff Squyres wrote:

(moving to devel so that others are aware)
Crud.  Can you send me your config.log?  I don't know why it's able  
to  find rdma_get_peer_addr() in configure, but then later not able  
to  find it during the build - I'd like to see what happened  
during  configure.

On May 2, 2008, at 7:09 PM, Pak Lui wrote:

Hi Jeff,

It seems that the cpc3 merge causes my Ranger build to break. I   
believe it is using OFED 1.2 but I don't know how to check. It   
passes the ompi_check_openib.m4 that you added in for the   
rdma_get_peer_addr. Is there a missing #include for openib/ofed   
related somewhere?



 1236 checking rdma/rdma_cma.h usability... yes
 1237 checking rdma/rdma_cma.h presence... yes
 1238 checking for rdma/rdma_cma.h... yes
 1239 checking for rdma_create_id in -lrdmacm... yes
 1240 checking for rdma_get_peer_addr... yes


pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info -  
I../../../opal/include -I../../../orte/include -I../../../ompi/  
include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -  
DOMPI_CONFIGURE_USER="\"paklui\"" -  
DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" -  
DOMPI_CONFIGURE_DATE="\"Fri May  2 17:07:01 CDT 2008\"" -  
DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" -  
DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG
\"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. - 
I../../../../ opal/in

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Pak Lui
For sanity sake I also checked the LD_LIBRARY_PATH, doesn't seem to be 
anything suspicious there either...


login3% echo $LD_LIBRARY_PATH
/opt/apps/pgi/7.1/linux86-64/7.1-2/libso:/opt/gsi-openssh-4.1/lib:/opt/gsi-openssh-4.1/lib:/opt/apps/binutils-amd/070220/lib64

I am trying Jeff's suggestion to replace OMPI_COMPILE_IFELSE to 
OMPI_LINK_IFELSE. Will let you know.


Pak Lui wrote:

Jeff Squyres wrote:

Jon / Steve -- can you comment?

I tested with OFED 1.2.5 (which is what I assume you meant) and got:

checking for rdma_get_peer_addr... no

Because that function is not defined in OFED 1.2.5.  Running with OFED  
1.3 (where the function does exist), I get:


checking for rdma_get_peer_addr... yes


For me it seems to be running with 1.2.5.

login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5

No rmda_get_peer_addr or rmda_get_local_addr in these .so's, assumingly 
they are coming from there.


login3% ls librdmacm.so*
librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2

login3% nm librdmacm.so* | grep rdma_get_
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices

And I don't see rdma_get_peer_addr appeared in the 
/opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually 
know about the interface (and it's not inline) there.


Outside of all the configure complexity, can you write a simple  
program that calls that function and have it compile and link properly?


These are the references of rmda_get_peer_addr from the config.log:
   47858 configure:120941: checking for rdma_get_peer_addr
   47859 configure:120966: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
   47860 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 412)

   47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   47862 configure:120972: $? = 0
   47863 configure:120987: result: yes
...
   48355 configure:123600: checking for rdma_get_peer_addr
   48356 configure:123625: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
   48357 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 423)

   48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   48359 configure:123631: $? = 0
   48360 configure:123646: result: yes

Here's my program, not sure if it's doing it correctly. I am no m4 
expert, so how do I run the ompi_check_openib.m4 independently and see 
the conftest.c??


login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
 void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
 return 0;
}

It gives me a warning if I just try to create an object, which is what I 
see in the config.log.


login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0

But trying to create an executable would give me the error.

login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3: 
undefined reference to `rdma_get_peer_addr'


Hmm, any clues, comments?

I suppose we could change the AC_COMPILE_IFELSE in config/ 
ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as  
to why it would compile successfully if the symbol rdma_get_peer_addr  
is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,  
AFAIK)...




On May 3, 2008, at 10:56 AM, Pak Lui wrote:


Sure Jeff, see attached.

Jeff Squyres wrote:

(moving to devel so that others are aware)
Crud.  Can you send me your config.log?  I don't know why it's able  
to  find rdma_get_peer_addr() in configure, but then later not able  
to  find it during the build - I'd like to see what happened  
during  configure.

On May 2, 2008, at 7:09 PM, Pak Lui wrote:

Hi Jeff,

It seems that the cpc3 merge causes my Ranger build to break. I   
believe it is using OFED 1.2 but I don't know how to check. It   
passes the ompi_check_openib.m4 that you added in for the   
rdma_get_peer_addr. Is there a missing #include for openib/ofed   
related somewhere?



 1236 checking rdma/rdma_cma.h usability... yes
 1237 checking rdma/rdma_cma.h presence... yes
 1238 checking for rdma/rdma_cma.h... yes
 1239 checking for rdma_create_id in -lrdmacm... yes
 1240 checking for rdma_get_peer_addr... yes


pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info -  
I../../../opal/include -I../../../orte/include -I../../../ompi/  
include -I../../../opal/mca/paffinity/linu

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Pak Lui
Hmm, so it's either setting up a totally new workspace or replacing with 
OMPI_LINK_IFELSE would get me the right configure check. I think the 
latter is the fix to my problem. I assume make all should work now 
unless I'll tell you otherwise...


  48773 configure:123602: checking for rdma_get_peer_addr
  48774 configure:123627: pgcc -o conftest -g   -D_REENTRANT 
-I/opt/ofed/include-L/opt/ofed/lib64 conftest.c -lnsl -lutil 
 -lpthread -libverbs  >&5

  48775 conftest.c:
  48776 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 423)

  48777 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
  48778 conftest.o: In function `main':
  48779 
/share/home/00951/paklui/ompi-trunk5/config-data2-debug/conftest.c:423: 
undefined reference to `rdma_get_peer_addr'

  48780 configure:123633: $? = 2
  48781 configure: failed program was:
  48782 | /* confdefs.h.  */
  48783 | #define PACKAGE_NAME "Open MPI"
...
  49196 | #define HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE 1
  49197 | #define HAVE_RDMA_RDMA_CMA_H 1
  49198 | /* end confdefs.h.  */
  49199 | #include "rdma/rdma_cma.h"
  49200 |
  49201 | int
  49202 | main ()
  49203 | {
  49204 | void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
  49205 |   ;
  49206 |   return 0;
  49207 | }
  49208 configure:123650: result: no

Pak Lui wrote:
For sanity sake I also checked the LD_LIBRARY_PATH, doesn't seem to be 
anything suspicious there either...


login3% echo $LD_LIBRARY_PATH
/opt/apps/pgi/7.1/linux86-64/7.1-2/libso:/opt/gsi-openssh-4.1/lib:/opt/gsi-openssh-4.1/lib:/opt/apps/binutils-amd/070220/lib64

I am trying Jeff's suggestion to replace OMPI_COMPILE_IFELSE to 
OMPI_LINK_IFELSE. Will let you know.


Pak Lui wrote:

Jeff Squyres wrote:

Jon / Steve -- can you comment?

I tested with OFED 1.2.5 (which is what I assume you meant) and got:

checking for rdma_get_peer_addr... no

Because that function is not defined in OFED 1.2.5.  Running with OFED  
1.3 (where the function does exist), I get:


checking for rdma_get_peer_addr... yes

For me it seems to be running with 1.2.5.

login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5

No rmda_get_peer_addr or rmda_get_local_addr in these .so's, assumingly 
they are coming from there.


login3% ls librdmacm.so*
librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2

login3% nm librdmacm.so* | grep rdma_get_
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices

And I don't see rdma_get_peer_addr appeared in the 
/opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually 
know about the interface (and it's not inline) there.


Outside of all the configure complexity, can you write a simple  
program that calls that function and have it compile and link properly?

These are the references of rmda_get_peer_addr from the config.log:
   47858 configure:120941: checking for rdma_get_peer_addr
   47859 configure:120966: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
   47860 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 412)

   47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   47862 configure:120972: $? = 0
   47863 configure:120987: result: yes
...
   48355 configure:123600: checking for rdma_get_peer_addr
   48356 configure:123625: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
   48357 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 423)

   48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   48359 configure:123631: $? = 0
   48360 configure:123646: result: yes

Here's my program, not sure if it's doing it correctly. I am no m4 
expert, so how do I run the ompi_check_openib.m4 independently and see 
the conftest.c??


login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
 void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
 return 0;
}

It gives me a warning if I just try to create an object, which is what I 
see in the config.log.


login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0

But trying to create an executable would give me the error.

login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3: 
undefined reference to `rdma_get_peer_addr'


Hmm, any clues, comments?

I suppose we could chang

Re: [OMPI devel] undefined references forrdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Jeff Squyres (jsquyres)
As steve mentioned, its inline.  But I don't understand how that would even 
compile if its not in rdma_cma.h.  Iflink will catch it, but I'm still a little 
uneasy not understanding why it passes the compile...

-jms
Sent from my PDA.  No type good.

 -Original Message-
From:   Pak Lui [mailto:pak@sun.com]
Sent:   Sunday, May 04, 2008 11:44 AM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] undefined references forrdma_get_peer_addr 
&   rdma_get_local_addr

Jeff Squyres wrote:
> Jon / Steve -- can you comment?
> 
> I tested with OFED 1.2.5 (which is what I assume you meant) and got:
> 
> checking for rdma_get_peer_addr... no
> 
> Because that function is not defined in OFED 1.2.5.  Running with OFED  
> 1.3 (where the function does exist), I get:
> 
> checking for rdma_get_peer_addr... yes

For me it seems to be running with 1.2.5.

login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5

No rmda_get_peer_addr or rmda_get_local_addr in these .so's, assumingly 
they are coming from there.

login3% ls librdmacm.so*
librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2

login3% nm librdmacm.so* | grep rdma_get_
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices

And I don't see rdma_get_peer_addr appeared in the 
/opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually 
know about the interface (and it's not inline) there.

> 
> Outside of all the configure complexity, can you write a simple  
> program that calls that function and have it compile and link properly?

These are the references of rmda_get_peer_addr from the config.log:
   47858 configure:120941: checking for rdma_get_peer_addr
   47859 configure:120966: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
   47860 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 412)
   47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   47862 configure:120972: $? = 0
   47863 configure:120987: result: yes
...
   48355 configure:123600: checking for rdma_get_peer_addr
   48356 configure:123625: pgcc -c -g   -D_REENTRANT 
-I/opt/ofed/include conftest.c >&5
   48357 PGC-W-0155-Pointer value created from a nonlong integral type 
(conftest.c: 423)
   48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   48359 configure:123631: $? = 0
   48360 configure:123646: result: yes

Here's my program, not sure if it's doing it correctly. I am no m4 
expert, so how do I run the ompi_check_openib.m4 independently and see 
the conftest.c??

login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
 void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
 return 0;
}

It gives me a warning if I just try to create an object, which is what I 
see in the config.log.

login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0

But trying to create an executable would give me the error.

login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type  (mytest.c: 3)
PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3: 
undefined reference to `rdma_get_peer_addr'

Hmm, any clues, comments?

> 
> I suppose we could change the AC_COMPILE_IFELSE in config/ 
> ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little confused as  
> to why it would compile successfully if the symbol rdma_get_peer_addr  
> is not declared anywhere (which it shouldn't be in OFED 1.2 or 1.2.5,  
> AFAIK)...
> 
> 
> 
> On May 3, 2008, at 10:56 AM, Pak Lui wrote:
> 
>> Sure Jeff, see attached.
>>
>> Jeff Squyres wrote:
>>> (moving to devel so that others are aware)
>>> Crud.  Can you send me your config.log?  I don't know why it's able  
>>> to  find rdma_get_peer_addr() in configure, but then later not able  
>>> to  find it during the build - I'd like to see what happened  
>>> during  configure.
>>> On May 2, 2008, at 7:09 PM, Pak Lui wrote:
 Hi Jeff,

 It seems that the cpc3 merge causes my Ranger build to break. I   
 believe it is using OFED 1.2 but I don't know how to check. It   
 passes the ompi_check_openib.m4 that you added in for the   
 rdma_get_peer_addr. Is there a missing #include for openib/ofed   
 related somewhere?


  1236 checking rdma/rdma_cma.h usability... yes
  1237 checking rdma/rdma_cma.h presence... yes
  1238 checking for rdma/rdma_cma.h... yes
  1239 checking

Re: [OMPI devel] undefined references forrdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Brian Barrett
I think I might see the issue.  Jeff, I'm assuming you're using a  
developer build of Open MPI with GNU, Intel, or Pathscale compilers,  
right?  At least someone below was using PGI.  The first three  
compilers on a developer build have the magic pixie dust arguments  
added that makes calling an undeclared function an error.  PGI, Sun  
Workshop, and non-developer builds don't have that pixie dust.  So  
it's not an error to call an undeclared function in those cases, and  
AC_COMPILE_IFELSE won't error out.  AC_LINK_IFELSE should always be  
used to check for functions for precisely that reason.


Brian

On May 4, 2008, at 11:41 AM, Jeff Squyres (jsquyres) wrote:

As steve mentioned, its inline.  But I don't understand how that  
would even compile if its not in rdma_cma.h.  Iflink will catch it,  
but I'm still a little uneasy not understanding why it passes the  
compile...


-jms
Sent from my PDA.  No type good.

 -Original Message-
From:   Pak Lui [mailto:pak@sun.com]
Sent:   Sunday, May 04, 2008 11:44 AM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] undefined references  
forrdma_get_peer_addr &   rdma_get_local_addr


Jeff Squyres wrote:
> Jon / Steve -- can you comment?
>
> I tested with OFED 1.2.5 (which is what I assume you meant) and got:
>
> checking for rdma_get_peer_addr... no
>
> Because that function is not defined in OFED 1.2.5.  Running with  
OFED

> 1.3 (where the function does exist), I get:
>
> checking for rdma_get_peer_addr... yes

For me it seems to be running with 1.2.5.

login3% /opt/ofed/bin/ofed_info | head -1
OFED-1.2.5.5

No rmda_get_peer_addr or rmda_get_local_addr in these .so's,  
assumingly

they are coming from there.

login3% ls librdmacm.so*
librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2

login3% nm librdmacm.so* | grep rdma_get_
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices
3470 T rdma_get_cm_event
1a20 T rdma_get_devices

And I don't see rdma_get_peer_addr appeared in the
/opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually
know about the interface (and it's not inline) there.

>
> Outside of all the configure complexity, can you write a simple
> program that calls that function and have it compile and link  
properly?


These are the references of rmda_get_peer_addr from the config.log:
   47858 configure:120941: checking for rdma_get_peer_addr
   47859 configure:120966: pgcc -c -g   -D_REENTRANT
-I/opt/ofed/include conftest.c >&5
   47860 PGC-W-0155-Pointer value created from a nonlong integral type
(conftest.c: 412)
   47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   47862 configure:120972: $? = 0
   47863 configure:120987: result: yes
...
   48355 configure:123600: checking for rdma_get_peer_addr
   48356 configure:123625: pgcc -c -g   -D_REENTRANT
-I/opt/ofed/include conftest.c >&5
   48357 PGC-W-0155-Pointer value created from a nonlong integral type
(conftest.c: 423)
   48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
   48359 configure:123631: $? = 0
   48360 configure:123646: result: yes

Here's my program, not sure if it's doing it correctly. I am no m4
expert, so how do I run the ompi_check_openib.m4 independently and see
the conftest.c??

login3% cat mytest.c
#include "rdma/rdma_cma.h"
int main (void) {
 void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
 return 0;
}

It gives me a warning if I just try to create an object, which is  
what I

see in the config.log.

login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
PGC-W-0155-Pointer value created from a nonlong integral type   
(mytest.c: 3)

PGC/x86-64 Linux 7.1-2: compilation completed with warnings
login3% echo $?
0

But trying to create an executable would give me the error.

login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
PGC-W-0155-Pointer value created from a nonlong integral type   
(mytest.c: 3)

PGC/x86-64 Linux 7.1-2: compilation completed with warnings
/tmp/pgccjF6BryhFmWS.o: In function `main':
/share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3:
undefined reference to `rdma_get_peer_addr'

Hmm, any clues, comments?

>
> I suppose we could change the AC_COMPILE_IFELSE in config/
> ompi_check_openib.m4 to OMPI_LINK_IFELSE, but I'm a little  
confused as
> to why it would compile successfully if the symbol  
rdma_get_peer_addr
> is not declared anywhere (which it shouldn't be in OFED 1.2 or  
1.2.5,

> AFAIK)...
>
>
>
> On May 3, 2008, at 10:56 AM, Pak Lui wrote:
>
>> Sure Jeff, see attached.
>>
>> Jeff Squyres wrote:
>>> (moving to devel so that others are aware)
>>> Crud.  Can you send me your config.log?  I don't know why it's  
able
>>> to  find rdma_get_peer_addr() in c

Re: [OMPI devel] OMPI Mercurial read-only mirror

2008-05-04 Thread Roland Dreier
 > >  > Can I make a /tmp branch from the hg read-only branch that is not tied 
 > >  > to the svn /tmp branches.

 > > Why do you want to do that?
 > >
 > > Mercurial is a fully distributed system, so you could just start
 > > committing to one of your local copies of the repository, and I can't
 > > see anything missing that a /tmp branch would give you.

 > Same reason you do an SVN tmp branch.  So others (outside of my 
 > employer's WAN) can actually clone the branch and try it out before you 
 > push it back to the repository.

Mercurial is a fully distributed system.  So instead of thinking of /tmp
branch, you should think of publishing your repository, which has your
commits in it.  As I understand it, open-mpi.org is not set up for
publishing other repositories yet, but it is quite easy to set up a
mercurial server; there are also several places that will host one for
you: http://www.selenic.com/mercurial/wiki/index.cgi/MercurialHosting

 - R.


Re: [OMPI devel] Build failure on FreeBSD 7

2008-05-04 Thread Paul H. Hargrove

Jeff Squyres wrote:

On May 3, 2008, at 7:32 PM, Brad Penoff wrote:

  

The small commit that Karol originally suggested was just pushed to
ompi-trunk.  This just simply adds the appropriate header files for
FreeBSD (6.2, 6.3 and 7) to be able to compile.



Good.

  

This didn't fix the hanging on the kevent call mentioned in this
thread, however, setting the environment variable EVENT_NOKQUEUE did
find a work-around.  I'm not sure if that is the solution we want for
all FreeBSD platforms in the long term (requiring the user to set
particular environment variables for particular platforms), but for
now at least I can run the MTT tests that I need to (once it gets in a
nightly build).



Unfortunately, I think you're the only one who cares about FreeBSD, so  
it's likely going to be up to you to get it working.  :-\  I'm not  
being snide; I'm just saying that it's likely that no one else cares  
about FreeBSD, so no one else will spend cycles on a fix for it -- the  
only thing that people will care about is how the fix affects the rest  
of the code base.


I agree that making people setenv EVENT_NOKQUEUE before running on  
FreeBSD is not desirable.  I'm not too much of a fan of your patch,  
though -- is there a better way?  E.g., can you extend the test in  
ompi/config/ompi_setup_libevent.c to reliably detect whether kevent  
works on FreeBSD or not?  I'm assuming that the test should return  
"no, kevent is not supported" on FreeBSD, as opposed to the "yes, it  
works" that it must be returning today.


  



In the end I don't care if OMPI runs on FreeBSD of not, but I see that I 
might be able to help a little here.


I have been doing some Xen testing that is entirely unrelated to OMPI, 
but happens to leave me with installations of FreeBSD 6.2 for i386 and 
FreeBSD 7.0 (both i386 and amd86).  I will not commit to building and 
testing an entire OMPI, but I can offer to try out any configure test 
that Brad devises, just to be sure the coverage is more than one 
installation.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900




Re: [OMPI devel] undefined referencesforrdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Jeff Squyres (jsquyres)
Coolio.  Pak - go ahead and commit if you haven't already done so.

-jms
Sent from my PDA.  No type good.

 -Original Message-
From:   Brian Barrett [mailto:brbar...@open-mpi.org]
Sent:   Sunday, May 04, 2008 02:14 PM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] undefined referencesforrdma_get_peer_addr  
&   rdma_get_local_addr

I think I might see the issue.  Jeff, I'm assuming you're using a  
developer build of Open MPI with GNU, Intel, or Pathscale compilers,  
right?  At least someone below was using PGI.  The first three  
compilers on a developer build have the magic pixie dust arguments  
added that makes calling an undeclared function an error.  PGI, Sun  
Workshop, and non-developer builds don't have that pixie dust.  So  
it's not an error to call an undeclared function in those cases, and  
AC_COMPILE_IFELSE won't error out.  AC_LINK_IFELSE should always be  
used to check for functions for precisely that reason.

Brian

On May 4, 2008, at 11:41 AM, Jeff Squyres (jsquyres) wrote:

> As steve mentioned, its inline.  But I don't understand how that  
> would even compile if its not in rdma_cma.h.  Iflink will catch it,  
> but I'm still a little uneasy not understanding why it passes the  
> compile...
>
> -jms
> Sent from my PDA.  No type good.
>
>  -Original Message-
> From:   Pak Lui [mailto:pak@sun.com]
> Sent:   Sunday, May 04, 2008 11:44 AM Eastern Standard Time
> To: Open MPI Developers
> Subject:Re: [OMPI devel] undefined references  
> forrdma_get_peer_addr &   rdma_get_local_addr
>
> Jeff Squyres wrote:
> > Jon / Steve -- can you comment?
> >
> > I tested with OFED 1.2.5 (which is what I assume you meant) and got:
> >
> > checking for rdma_get_peer_addr... no
> >
> > Because that function is not defined in OFED 1.2.5.  Running with  
> OFED
> > 1.3 (where the function does exist), I get:
> >
> > checking for rdma_get_peer_addr... yes
>
> For me it seems to be running with 1.2.5.
>
> login3% /opt/ofed/bin/ofed_info | head -1
> OFED-1.2.5.5
>
> No rmda_get_peer_addr or rmda_get_local_addr in these .so's,  
> assumingly
> they are coming from there.
>
> login3% ls librdmacm.so*
> librdmacm.so  librdmacm.so.1  librdmacm.so.1.0.0  librdmacm.so.1.0.2
>
> login3% nm librdmacm.so* | grep rdma_get_
> 3470 T rdma_get_cm_event
> 1a20 T rdma_get_devices
> 3470 T rdma_get_cm_event
> 1a20 T rdma_get_devices
> 3470 T rdma_get_cm_event
> 1a20 T rdma_get_devices
> 3470 T rdma_get_cm_event
> 1a20 T rdma_get_devices
>
> And I don't see rdma_get_peer_addr appeared in the
> /opt/ofed/include/rdma/rdma_cma.h either. Not knowing how it actually
> know about the interface (and it's not inline) there.
>
> >
> > Outside of all the configure complexity, can you write a simple
> > program that calls that function and have it compile and link  
> properly?
>
> These are the references of rmda_get_peer_addr from the config.log:
>47858 configure:120941: checking for rdma_get_peer_addr
>47859 configure:120966: pgcc -c -g   -D_REENTRANT
> -I/opt/ofed/include conftest.c >&5
>47860 PGC-W-0155-Pointer value created from a nonlong integral type
> (conftest.c: 412)
>47861 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
>47862 configure:120972: $? = 0
>47863 configure:120987: result: yes
> ...
>48355 configure:123600: checking for rdma_get_peer_addr
>48356 configure:123625: pgcc -c -g   -D_REENTRANT
> -I/opt/ofed/include conftest.c >&5
>48357 PGC-W-0155-Pointer value created from a nonlong integral type
> (conftest.c: 423)
>48358 PGC/x86-64 Linux 7.1-2: compilation completed with warnings
>48359 configure:123631: $? = 0
>48360 configure:123646: result: yes
>
> Here's my program, not sure if it's doing it correctly. I am no m4
> expert, so how do I run the ompi_check_openib.m4 independently and see
> the conftest.c??
>
> login3% cat mytest.c
> #include "rdma/rdma_cma.h"
> int main (void) {
>  void *ret = (void*) rdma_get_peer_addr((struct rdma_cm_id*)0);
>  return 0;
> }
>
> It gives me a warning if I just try to create an object, which is  
> what I
> see in the config.log.
>
> login3% pgcc -c -g   -D_REENTRANT  -I/opt/ofed/include mytest.c
> PGC-W-0155-Pointer value created from a nonlong integral type   
> (mytest.c: 3)
> PGC/x86-64 Linux 7.1-2: compilation completed with warnings
> login3% echo $?
> 0
>
> But trying to create an executable would give me the error.
>
> login3% pgcc -g -D_REENTRANT -I/opt/ofed/include mytest.c -o mytest
> PGC-W-0155-Pointer value created from a nonlong integral type   
> (mytest.c: 3)
> PGC/x86-64 Linux 7.1-2: compilation completed with warnings
> /tmp/pgccjF6BryhFmWS.o: In function `main':
> /share/home/00951/paklui/ompi-trunk5/config-data1-debug/mytest.c:3:
> undefined reference to `rdma_get_peer_addr'
>
> Hmm, any clu