Paul,
the openib btl need to be reworked ...
currently, we have
#define HAVE_XRC (1 == OMPI_HAVE_CONNECTX_XRC)
/* and it is impossible to have !OMPI_HAVE_CONNECTX_XRC &&
OMPI_HAVE_CONNECTX_XRC_DOMAINS */
but with your patch, this becomes possible and HAVE_XRC would be zero,
which is incorrect
i will fix that too
Cheers,
Gilles
On 7/10/2015 1:16 PM, Paul Hargrove wrote:
Gilles,
I've made another observation about what I believe is an error in the
XRC configure probe.
If I am following the code below correctly, then *both* ConnectX and
ConnectIB depend on ibv_create_xrc_rcv_qp being defined.
However, that function is marked as deprecated (presumably in favor of
ibv_cmd_open_xrcd).
So, when a later revision of libibverbs removes the deprecated
function, *neither* the old or new interface will be detected as
supported!
# ibv_create_xrc_rcv_qp was added in OFED 1.3
# ibv_cmd_open_xrcd (aka XRC Domains) was added in OFED 3.12
if test "$enable_connectx_xrc" = "yes"; then
$1_have_xrc=1
AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
[], [$1_have_xrc=0])
AC_CHECK_DECLS([IBV_SRQT_XRC],
[], [$1_have_xrc=0],
[#include <infiniband/verbs.h>])
fi
if test "$enable_connectx_xrc" = "yes" \
&& test $$1_have_xrc -eq 1; then
AC_CHECK_FUNCS([ibv_cmd_open_xrcd], [$1_have_xrc_domains=1])
fi
While I am not certain if a probe for IBV_SRQT_XRC is really
necessary, my suggested replacement for the logic above is:
# ibv_create_xrc_rcv_qp was added in OFED 1.3
# ibv_cmd_open_xrcd (aka XRC Domains) was added in OFED 3.12
if test "$enable_connectx_xrc" = "yes"; then
$1_have_xrc=1
AC_CHECK_FUNCS([ibv_cmd_open_xrcd],
[AC_CHECK_DECLS([IBV_SRQT_XRC],
[$1_have_xrc_domains=1],
[$1_have_xrc=0],
[#include <infiniband/verbs.h>])])
AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
[$1_have_xrc=1])
fi
In summary
$1_have_xrc_domains = HAVE_IBV_CMD_OPEN_XRCD && HAVE_IBV_SRQT_XRC
$1_have_xrc = $1_have_xrc_domains || HAVE_IBV_CREATE_XRC_RCV_QP
This worked as expected on old (only ConnectX XRC support) and new
(both ConnectX and ConnectIB XRC support) in my testing.
-Paul
On Thu, Jul 9, 2015 at 7:06 PM, Paul Hargrove <phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>> wrote:
Gilles,
The patch didn't apply to the 1.8.7rc1 tarball.
So, I made the change manually and ran autogen.pl <http://autogen.pl>.
The result is that one fewer configure test runs, but "ConnectX
XRC support" is still disabled:
Diffing the configure output:
checking for ibv_resize_cq... yes
checking for struct ibv_device.transport_type... yes
checking for ibv_create_xrc_rcv_qp... yes
-checking for ibv_cmd_open_xrcd... no
checking whether IBV_SRQT_XRC is declared... no
checking infiniband/complib/cl_types_osd.h usability... no
checking infiniband/complib/cl_types_osd.h presence... no
You will note that "IBV_SRQT_XRC" did not appear when I grepped
for XRC in /usr/include/infiniband/verbs.h (in a previous message).
I am not sure, but suspect that identifier is related to
"ConnectIB XRC support" (not ConnectX).
If you look back at the 1.8.4 release you will find only a check
for ibv_create_xrc_rcv_qp.
-Paul
On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Thanks Paul,
i just found an other bug ...
(and i should be blamed for it)
here is attached a patch.
basically, xrc was incorrectly disabled on "older" ofed stacks
Cheers,
Gilles
On 7/10/2015 10:06 AM, Paul Hargrove wrote:
Gilles,
A bzip2-compressed config.log is attached.
I am unsure how to determine the OFED version, because the
admins have prevented normal users from reading the RPM database.
Perhaps the following helps:
$ nm /usr/lib64/libibverbs.a | grep -i xrc
00000000000000e0 T ibv_cmd_close_xrc_domain
0000000000000230 T ibv_cmd_create_xrc_rcv_qp
00000000000003b0 T ibv_cmd_create_xrc_srq
0000000000000a40 T ibv_cmd_modify_xrc_rcv_qp
0000000000000150 T ibv_cmd_open_xrc_domain
0000000000001e30 T ibv_cmd_query_xrc_rcv_qp
0000000000000070 T ibv_cmd_reg_xrc_rcv_qp
0000000000000000 T ibv_cmd_unreg_xrc_rcv_qp
00000000000002b0 T ibv_close_xrc_domain
00000000000002d0 T ibv_create_xrc_rcv_qp
00000000000007a0 T ibv_create_xrc_srq
0000000000000310 T ibv_modify_xrc_rcv_qp
0000000000000280 T ibv_open_xrc_domain
0000000000000340 T ibv_query_xrc_rcv_qp
0000000000000370 T ibv_reg_xrc_rcv_qp
0000000000000390 T ibv_unreg_xrc_rcv_qp
$ grep XRC /usr/include/infiniband/verbs.h
IBV_DEVICE_XRC = 1 << 20
IBV_XRC_QP_EVENT_FLAG = 0x80000000,
IBV_QPT_XRC,
[matches in comments have been removed].
When tonight's master tarball is posted (perhaps 10 minutes
from now) I will test it and report what I find.
-Paul
On Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Paul,
can you please compress and post your config.log ?
what is the OFED version you are running ?
on master, that fix did the trick on mellanox test
cluster (recent OFED version) but did not
enable XRC on lanl test clusters (my best bet is an old
OFED library)
Thanks
Gilles
On 7/10/2015 9:08 AM, Paul Hargrove wrote:
Preliminary report:
1) I find that "ConnectX XRC support" is still not
detected as it was in 1.8.4 and earlier:
$ grep 'ConnectX XRC support'
openmpi-1.*-icc-14/LOG/configure.log| sort -u
openmpi-1.8-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.1-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.2-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.3-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.4-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.5-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... no
openmpi-1.8.6-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... no
openmpi-1.8.7rc1-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... no
2) I noticed a cosmetic "glitch" in the configure output:
checking for working epoll library interface... checking if
epoll can build... yes
yes
This just means AC_MSG_{CHECKING,RESULT} macros are
nested when they shouldn't be.
There is nothing to suggest that the results of the
configure probes are incorrect.
-Paul
On Thu, Jul 9, 2015 at 1:03 PM, Ralph Castain
<r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
In the usual place:
http://www.open-mpi.org/software/ompi/v1.8/
Please test and let me know of any issues that
surface. My intent is to release this next week.
Ralph
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/07/17604.php
--
Paul H. Hargrove phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax:
+1-510-486-6900 <tel:%2B1-510-486-6900>
_______________________________________________ devel
mailing list de...@open-mpi.org
<mailto:de...@open-mpi.org> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/07/17606.php
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/07/17607.php
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel:
+1-510-495-2352 <tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax:
+1-510-486-6900 <tel:%2B1-510-486-6900>
_______________________________________________ devel mailing
list de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/07/17608.php
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/07/17609.php
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<tel:%2B1-510-486-6900>
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2015/07/17611.php