Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Paul Hargrove
Gilles,

I've made another observation about what I believe is an error in the XRC
configure probe.

If I am following the code below correctly, then *both* ConnectX and
ConnectIB depend on ibv_create_xrc_rcv_qp being defined.
However, that function is marked as deprecated (presumably in favor of
ibv_cmd_open_xrcd).
So, when a later revision of libibverbs removes the deprecated function,
*neither* the old or new interface will be detected as supported!

   # ibv_create_xrc_rcv_qp was added in OFED 1.3
   # ibv_cmd_open_xrcd (aka XRC Domains) was added in  OFED 3.12
   if test "$enable_connectx_xrc" = "yes"; then
   $1_have_xrc=1
   AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
  [], [$1_have_xrc=0])
   AC_CHECK_DECLS([IBV_SRQT_XRC],
  [], [$1_have_xrc=0],
  [#include ])
   fi
   if test "$enable_connectx_xrc" = "yes" \
   && test $$1_have_xrc -eq 1; then
   AC_CHECK_FUNCS([ibv_cmd_open_xrcd], [$1_have_xrc_domains=1])
   fi

While I am not certain if a probe for IBV_SRQT_XRC is really necessary, my
suggested replacement for the logic above is:

   # ibv_create_xrc_rcv_qp was added in OFED 1.3
   # ibv_cmd_open_xrcd (aka XRC Domains) was added in  OFED 3.12
   if test "$enable_connectx_xrc" = "yes"; then
   $1_have_xrc=1
   AC_CHECK_FUNCS([ibv_cmd_open_xrcd],
  [AC_CHECK_DECLS([IBV_SRQT_XRC],
  [$1_have_xrc_domains=1],
  [$1_have_xrc=0],
  [#include
])])
   AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
  [$1_have_xrc=1])
   fi

In summary
  $1_have_xrc_domains = HAVE_IBV_CMD_OPEN_XRCD && HAVE_IBV_SRQT_XRC
  $1_have_xrc = $1_have_xrc_domains || HAVE_IBV_CREATE_XRC_RCV_QP

This worked as expected on old (only ConnectX XRC support) and new (both
ConnectX and ConnectIB XRC support) in my testing.

-Paul

On Thu, Jul 9, 2015 at 7:06 PM, Paul Hargrove  wrote:

> Gilles,
>
> The patch didn't apply to the 1.8.7rc1 tarball.
> So, I made the change manually and ran autogen.pl.
>
> The result is that one fewer configure test runs, but "ConnectX XRC
> support" is still disabled:
>
> Diffing the configure output:
>  checking for ibv_resize_cq... yes
>  checking for struct ibv_device.transport_type... yes
>  checking for ibv_create_xrc_rcv_qp... yes
> -checking for ibv_cmd_open_xrcd... no
>  checking whether IBV_SRQT_XRC is declared... no
>  checking infiniband/complib/cl_types_osd.h usability... no
>  checking infiniband/complib/cl_types_osd.h presence... no
>
>
> You will note that "IBV_SRQT_XRC" did not appear when I grepped for XRC in
> /usr/include/infiniband/verbs.h (in a previous message).
> I am not sure, but suspect that identifier is related to "ConnectIB XRC
> support" (not ConnectX).
> If you look back at the 1.8.4 release you will find only a check for
> ibv_create_xrc_rcv_qp.
>
> -Paul
>
> On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet 
> wrote:
>
>>  Thanks Paul,
>>
>> i just found an other bug ...
>> (and i should be blamed for it)
>>
>> here is attached a patch.
>>
>> basically, xrc was incorrectly disabled on "older" ofed stacks
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>> On 7/10/2015 10:06 AM, Paul Hargrove wrote:
>>
>> Gilles,
>>
>>  A bzip2-compressed config.log is attached.
>>
>>  I am unsure how to determine the OFED version, because the admins have
>> prevented normal users from reading the RPM database.
>> Perhaps the following helps:
>>
>>  $ nm /usr/lib64/libibverbs.a | grep -i xrc
>> 00e0 T ibv_cmd_close_xrc_domain
>> 0230 T ibv_cmd_create_xrc_rcv_qp
>> 03b0 T ibv_cmd_create_xrc_srq
>> 0a40 T ibv_cmd_modify_xrc_rcv_qp
>> 0150 T ibv_cmd_open_xrc_domain
>> 1e30 T ibv_cmd_query_xrc_rcv_qp
>> 0070 T ibv_cmd_reg_xrc_rcv_qp
>>  T ibv_cmd_unreg_xrc_rcv_qp
>> 02b0 T ibv_close_xrc_domain
>> 02d0 T ibv_create_xrc_rcv_qp
>> 07a0 T ibv_create_xrc_srq
>> 0310 T ibv_modify_xrc_rcv_qp
>> 0280 T ibv_open_xrc_domain
>> 0340 T ibv_query_xrc_rcv_qp
>> 0370 T ibv_reg_xrc_rcv_qp
>> 0390 T ibv_unreg_xrc_rcv_qp
>>
>>  $ grep XRC /usr/include/infiniband/verbs.h
>> IBV_DEVICE_XRC  = 1 << 20
>> IBV_XRC_QP_EVENT_FLAG = 0x8000,
>> IBV_QPT_XRC,
>> [matches in comments have been removed].
>>
>>  When tonight's master tarball is posted (perhaps 10 minutes from now) I
>> will test it and report what I find.
>>
>>  -Paul
>>
>>
>> On Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet 
>> wrote:
>>
>>>  Paul,
>>>
>>> can you please compress and post your config.log ?

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Gilles Gouaillardet

Paul,

i just applied the patch on the tarball, and it worked for me.
anyway, the IBV_SRQT_XRC test was misplaced (and i just read you already 
found out ...)

we need if for XRC_DOMAINS and *not* for XRC

the newly attached patch will (hopefully) fix this

Cheers,

Gilles

On 7/10/2015 11:06 AM, Paul Hargrove wrote:

Gilles,

The patch didn't apply to the 1.8.7rc1 tarball.
So, I made the change manually and ran autogen.pl .

The result is that one fewer configure test runs, but "ConnectX XRC 
support" is still disabled:


Diffing the configure output:
 checking for ibv_resize_cq... yes
 checking for struct ibv_device.transport_type... yes
 checking for ibv_create_xrc_rcv_qp... yes
-checking for ibv_cmd_open_xrcd... no
 checking whether IBV_SRQT_XRC is declared... no
 checking infiniband/complib/cl_types_osd.h usability... no
 checking infiniband/complib/cl_types_osd.h presence... no


You will note that "IBV_SRQT_XRC" did not appear when I grepped for 
XRC in /usr/include/infiniband/verbs.h (in a previous message).
I am not sure, but suspect that identifier is related to "ConnectIB 
XRC support" (not ConnectX).
If you look back at the 1.8.4 release you will find only a check for 
ibv_create_xrc_rcv_qp.


-Paul

On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet > wrote:


Thanks Paul,

i just found an other bug ...
(and i should be blamed for it)

here is attached a patch.

basically, xrc was incorrectly disabled on "older" ofed stacks

Cheers,

Gilles



On 7/10/2015 10:06 AM, Paul Hargrove wrote:

Gilles,

A bzip2-compressed config.log is attached.

I am unsure how to determine the OFED version, because the admins
have prevented normal users from reading the RPM database.
Perhaps the following helps:

$ nm /usr/lib64/libibverbs.a | grep -i xrc
00e0 T ibv_cmd_close_xrc_domain
0230 T ibv_cmd_create_xrc_rcv_qp
03b0 T ibv_cmd_create_xrc_srq
0a40 T ibv_cmd_modify_xrc_rcv_qp
0150 T ibv_cmd_open_xrc_domain
1e30 T ibv_cmd_query_xrc_rcv_qp
0070 T ibv_cmd_reg_xrc_rcv_qp
 T ibv_cmd_unreg_xrc_rcv_qp
02b0 T ibv_close_xrc_domain
02d0 T ibv_create_xrc_rcv_qp
07a0 T ibv_create_xrc_srq
0310 T ibv_modify_xrc_rcv_qp
0280 T ibv_open_xrc_domain
0340 T ibv_query_xrc_rcv_qp
0370 T ibv_reg_xrc_rcv_qp
0390 T ibv_unreg_xrc_rcv_qp

$ grep XRC /usr/include/infiniband/verbs.h
IBV_DEVICE_XRC  = 1 << 20
IBV_XRC_QP_EVENT_FLAG = 0x8000,
IBV_QPT_XRC,
[matches in comments have been removed].

When tonight's master tarball is posted (perhaps 10 minutes from
now) I will test it and report what I find.

-Paul


On Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:

Paul,

can you please compress and post your config.log ?
what is the OFED version you are running ?

on master, that fix did the trick on mellanox test cluster
(recent OFED version) but did not
enable XRC on lanl test clusters (my best bet is an old OFED
library)

Thanks

Gilles


On 7/10/2015 9:08 AM, Paul Hargrove wrote:

Preliminary report:

1) I find that "ConnectX XRC support" is still not detected
as it was in 1.8.4 and earlier:

$ grep  'ConnectX XRC support'
openmpi-1.*-icc-14/LOG/configure.log|  sort -u
openmpi-1.8-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.1-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.2-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.3-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.4-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... yes
openmpi-1.8.5-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... no
openmpi-1.8.6-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... no
openmpi-1.8.7rc1-linux-x86_64-icc-14/LOG/configure.log:checking
if ConnectX XRC support is enabled... no



2) I noticed a cosmetic "glitch" in the configure output:

checking for working epoll library interface... checking if epoll 
can build... yes

yes

This just means AC_MSG_{CHECKING,RESULT} macros are nested
when they shouldn't be.

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Gilles Gouaillardet

Paul,

the openib btl need to be reworked ...

currently, we have
#define HAVE_XRC (1 == OMPI_HAVE_CONNECTX_XRC)
/* and it is impossible to have !OMPI_HAVE_CONNECTX_XRC && 
OMPI_HAVE_CONNECTX_XRC_DOMAINS */


but with your patch, this becomes possible and HAVE_XRC would be zero, 
which is incorrect


i will fix that too

Cheers,

Gilles


On 7/10/2015 1:16 PM, Paul Hargrove wrote:

Gilles,

I've made another observation about what I believe is an error in the 
XRC configure probe.


If I am following the code below correctly, then *both* ConnectX and 
ConnectIB depend on ibv_create_xrc_rcv_qp being defined.
However, that function is marked as deprecated (presumably in favor of 
ibv_cmd_open_xrcd).
So, when a later revision of libibverbs removes the deprecated 
function, *neither* the old or new interface will be detected as 
supported!


   # ibv_create_xrc_rcv_qp was added in OFED 1.3
   # ibv_cmd_open_xrcd (aka XRC Domains) was added in  OFED 3.12
   if test "$enable_connectx_xrc" = "yes"; then
 $1_have_xrc=1
 AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
  [], [$1_have_xrc=0])
 AC_CHECK_DECLS([IBV_SRQT_XRC],
  [], [$1_have_xrc=0],
  [#include ])
   fi
   if test "$enable_connectx_xrc" = "yes" \
 && test $$1_have_xrc -eq 1; then
 AC_CHECK_FUNCS([ibv_cmd_open_xrcd], [$1_have_xrc_domains=1])
   fi

While I am not certain if a probe for IBV_SRQT_XRC is really 
necessary, my suggested replacement for the logic above is:


   # ibv_create_xrc_rcv_qp was added in OFED 1.3
   # ibv_cmd_open_xrcd (aka XRC Domains) was added in  OFED 3.12
   if test "$enable_connectx_xrc" = "yes"; then
 $1_have_xrc=1
 AC_CHECK_FUNCS([ibv_cmd_open_xrcd],
  [AC_CHECK_DECLS([IBV_SRQT_XRC],
  [$1_have_xrc_domains=1],
  [$1_have_xrc=0],
  [#include ])])
 AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
  [$1_have_xrc=1])
   fi

In summary
  $1_have_xrc_domains = HAVE_IBV_CMD_OPEN_XRCD && HAVE_IBV_SRQT_XRC
  $1_have_xrc = $1_have_xrc_domains || HAVE_IBV_CREATE_XRC_RCV_QP

This worked as expected on old (only ConnectX XRC support) and new 
(both ConnectX and ConnectIB XRC support) in my testing.

-Paul

On Thu, Jul 9, 2015 at 7:06 PM, Paul Hargrove > wrote:


Gilles,

The patch didn't apply to the 1.8.7rc1 tarball.
So, I made the change manually and ran autogen.pl .

The result is that one fewer configure test runs, but "ConnectX
XRC support" is still disabled:

Diffing the configure output:
 checking for ibv_resize_cq... yes
 checking for struct ibv_device.transport_type... yes
 checking for ibv_create_xrc_rcv_qp... yes
-checking for ibv_cmd_open_xrcd... no
 checking whether IBV_SRQT_XRC is declared... no
 checking infiniband/complib/cl_types_osd.h usability... no
 checking infiniband/complib/cl_types_osd.h presence... no


You will note that "IBV_SRQT_XRC" did not appear when I grepped
for XRC in /usr/include/infiniband/verbs.h (in a previous message).
I am not sure, but suspect that identifier is related to
"ConnectIB XRC support" (not ConnectX).
If you look back at the 1.8.4 release you will find only a check
for ibv_create_xrc_rcv_qp.

-Paul

On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:

Thanks Paul,

i just found an other bug ...
(and i should be blamed for it)

here is attached a patch.

basically, xrc was incorrectly disabled on "older" ofed stacks

Cheers,

Gilles



On 7/10/2015 10:06 AM, Paul Hargrove wrote:

Gilles,

A bzip2-compressed config.log is attached.

I am unsure how to determine the OFED version, because the
admins have prevented normal users from reading the RPM database.
Perhaps the following helps:

$ nm /usr/lib64/libibverbs.a | grep -i xrc
00e0 T ibv_cmd_close_xrc_domain
0230 T ibv_cmd_create_xrc_rcv_qp
03b0 T ibv_cmd_create_xrc_srq
0a40 T ibv_cmd_modify_xrc_rcv_qp
0150 T ibv_cmd_open_xrc_domain
1e30 T ibv_cmd_query_xrc_rcv_qp
0070 T ibv_cmd_reg_xrc_rcv_qp
 T ibv_cmd_unreg_xrc_rcv_qp
02b0 T ibv_close_xrc_domain
02d0 T ibv_create_xrc_rcv_qp
07a0 T ibv_create_xrc_srq
0310 T ibv_modify_xrc_rcv_qp
0280 T ibv_open_xrc_domain
0340 T ibv_query_xrc_rcv_qp
0370 T ibv_reg_xrc_rcv_qp
0390 T ibv_unreg_xrc_rcv_qp

$ grep XRC /usr/include/infiniband/verbs.h
IBV_DEVICE_XRC= 1 << 20
IBV_XRC_QP_EVENT_FLAG = 0x8000,
IB

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Paul Hargrove
Gilles,

I see now that your patches aren't applying because they have DOS line
endings (CRLF vs LF).
I need to strip those to get the patches applied.

I will report results with your patch on both my "old" and "new" systems as
soon as possible.

-Paul

On Thu, Jul 9, 2015 at 9:26 PM, Gilles Gouaillardet 
wrote:

>  Paul,
>
> i just applied the patch on the tarball, and it worked for me.
> anyway, the IBV_SRQT_XRC test was misplaced (and i just read you already
> found out ...)
> we need if for XRC_DOMAINS and *not* for XRC
>
> the newly attached patch will (hopefully) fix this
>
> Cheers,
>
> Gilles
>
>
> On 7/10/2015 11:06 AM, Paul Hargrove wrote:
>
> Gilles,
>
>  The patch didn't apply to the 1.8.7rc1 tarball.
> So, I made the change manually and ran autogen.pl.
>
>  The result is that one fewer configure test runs, but "ConnectX XRC
> support" is still disabled:
>
>  Diffing the configure output:
>   checking for ibv_resize_cq... yes
>  checking for struct ibv_device.transport_type... yes
>  checking for ibv_create_xrc_rcv_qp... yes
> -checking for ibv_cmd_open_xrcd... no
>  checking whether IBV_SRQT_XRC is declared... no
>  checking infiniband/complib/cl_types_osd.h usability... no
>  checking infiniband/complib/cl_types_osd.h presence... no
>
>
>  You will note that "IBV_SRQT_XRC" did not appear when I grepped for XRC
> in /usr/include/infiniband/verbs.h (in a previous message).
> I am not sure, but suspect that identifier is related to "ConnectIB XRC
> support" (not ConnectX).
> If you look back at the 1.8.4 release you will find only a check for
> ibv_create_xrc_rcv_qp.
>
>  -Paul
>
> On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet 
> wrote:
>
>>  Thanks Paul,
>>
>> i just found an other bug ...
>> (and i should be blamed for it)
>>
>> here is attached a patch.
>>
>> basically, xrc was incorrectly disabled on "older" ofed stacks
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>> On 7/10/2015 10:06 AM, Paul Hargrove wrote:
>>
>>  Gilles,
>>
>>  A bzip2-compressed config.log is attached.
>>
>>  I am unsure how to determine the OFED version, because the admins have
>> prevented normal users from reading the RPM database.
>> Perhaps the following helps:
>>
>>  $ nm /usr/lib64/libibverbs.a | grep -i xrc
>> 00e0 T ibv_cmd_close_xrc_domain
>> 0230 T ibv_cmd_create_xrc_rcv_qp
>> 03b0 T ibv_cmd_create_xrc_srq
>> 0a40 T ibv_cmd_modify_xrc_rcv_qp
>> 0150 T ibv_cmd_open_xrc_domain
>> 1e30 T ibv_cmd_query_xrc_rcv_qp
>> 0070 T ibv_cmd_reg_xrc_rcv_qp
>>  T ibv_cmd_unreg_xrc_rcv_qp
>> 02b0 T ibv_close_xrc_domain
>> 02d0 T ibv_create_xrc_rcv_qp
>> 07a0 T ibv_create_xrc_srq
>> 0310 T ibv_modify_xrc_rcv_qp
>> 0280 T ibv_open_xrc_domain
>> 0340 T ibv_query_xrc_rcv_qp
>> 0370 T ibv_reg_xrc_rcv_qp
>> 0390 T ibv_unreg_xrc_rcv_qp
>>
>>  $ grep XRC /usr/include/infiniband/verbs.h
>> IBV_DEVICE_XRC  = 1 << 20
>> IBV_XRC_QP_EVENT_FLAG = 0x8000,
>> IBV_QPT_XRC,
>> [matches in comments have been removed].
>>
>>  When tonight's master tarball is posted (perhaps 10 minutes from now) I
>> will test it and report what I find.
>>
>>  -Paul
>>
>>
>> On Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet 
>> wrote:
>>
>>>  Paul,
>>>
>>> can you please compress and post your config.log ?
>>> what is the OFED version you are running ?
>>>
>>> on master, that fix did the trick on mellanox test cluster (recent OFED
>>> version) but did not
>>> enable XRC on lanl test clusters (my best bet is an old OFED library)
>>>
>>> Thanks
>>>
>>> Gilles
>>>
>>>
>>> On 7/10/2015 9:08 AM, Paul Hargrove wrote:
>>>
>>>  Preliminary report:
>>>
>>> 1) I find that "ConnectX XRC support" is still not detected as it was in
>>> 1.8.4 and earlier:
>>>
>>>  $ grep  'ConnectX XRC support' openmpi-1.*-icc-14/LOG/configure.log|
>>>  sort -u
>>>   openmpi-1.8-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... yes
>>>   openmpi-1.8.1-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... yes
>>>   openmpi-1.8.2-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... yes
>>>   openmpi-1.8.3-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... yes
>>>   openmpi-1.8.4-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... yes
>>>   openmpi-1.8.5-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... no
>>>   openmpi-1.8.6-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... no
>>>   openmpi-1.8.7rc1-linux-x86_64-icc-14/LOG/configure.log:checking if
>>> ConnectX XRC support is enabled... no
>>>
>>>
>>>
>>>  2) I noticed a cosmetic "glitch" in the configure output:
>>>
>>>  checking for working epoll lib

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Paul Hargrove
Gilles,

If I am correctly understanding the purpose of deprecating the old
interfaces then the openib btl will *eventually* need to support
!OMPI_HAVE_CONNECTX_XRC && OMPI_HAVE_CONNECTX_XRC_DOMAINS

However, I am not in a position to decide if that is really required for
1.8.x or not.

-Paul




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Paul Hargrove
Gilles,

Applying both of your patches to the 1.8.7rc1 tarball appears to give the
desired results on both of the systems I tested.

-Paul

On Thu, Jul 9, 2015 at 9:39 PM, Paul Hargrove  wrote:

> Gilles,
>
> I see now that your patches aren't applying because they have DOS line
> endings (CRLF vs LF).
> I need to strip those to get the patches applied.
>
> I will report results with your patch on both my "old" and "new" systems
> as soon as possible.
>
> -Paul
>
> On Thu, Jul 9, 2015 at 9:26 PM, Gilles Gouaillardet 
> wrote:
>
>>  Paul,
>>
>> i just applied the patch on the tarball, and it worked for me.
>> anyway, the IBV_SRQT_XRC test was misplaced (and i just read you already
>> found out ...)
>> we need if for XRC_DOMAINS and *not* for XRC
>>
>> the newly attached patch will (hopefully) fix this
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 7/10/2015 11:06 AM, Paul Hargrove wrote:
>>
>> Gilles,
>>
>>  The patch didn't apply to the 1.8.7rc1 tarball.
>> So, I made the change manually and ran autogen.pl.
>>
>>  The result is that one fewer configure test runs, but "ConnectX XRC
>> support" is still disabled:
>>
>>  Diffing the configure output:
>>   checking for ibv_resize_cq... yes
>>  checking for struct ibv_device.transport_type... yes
>>  checking for ibv_create_xrc_rcv_qp... yes
>> -checking for ibv_cmd_open_xrcd... no
>>  checking whether IBV_SRQT_XRC is declared... no
>>  checking infiniband/complib/cl_types_osd.h usability... no
>>  checking infiniband/complib/cl_types_osd.h presence... no
>>
>>
>>  You will note that "IBV_SRQT_XRC" did not appear when I grepped for XRC
>> in /usr/include/infiniband/verbs.h (in a previous message).
>> I am not sure, but suspect that identifier is related to "ConnectIB XRC
>> support" (not ConnectX).
>> If you look back at the 1.8.4 release you will find only a check for
>> ibv_create_xrc_rcv_qp.
>>
>>  -Paul
>>
>> On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet 
>> wrote:
>>
>>>  Thanks Paul,
>>>
>>> i just found an other bug ...
>>> (and i should be blamed for it)
>>>
>>> here is attached a patch.
>>>
>>> basically, xrc was incorrectly disabled on "older" ofed stacks
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>>
>>> On 7/10/2015 10:06 AM, Paul Hargrove wrote:
>>>
>>>  Gilles,
>>>
>>>  A bzip2-compressed config.log is attached.
>>>
>>>  I am unsure how to determine the OFED version, because the admins have
>>> prevented normal users from reading the RPM database.
>>> Perhaps the following helps:
>>>
>>>  $ nm /usr/lib64/libibverbs.a | grep -i xrc
>>> 00e0 T ibv_cmd_close_xrc_domain
>>> 0230 T ibv_cmd_create_xrc_rcv_qp
>>> 03b0 T ibv_cmd_create_xrc_srq
>>> 0a40 T ibv_cmd_modify_xrc_rcv_qp
>>> 0150 T ibv_cmd_open_xrc_domain
>>> 1e30 T ibv_cmd_query_xrc_rcv_qp
>>> 0070 T ibv_cmd_reg_xrc_rcv_qp
>>>  T ibv_cmd_unreg_xrc_rcv_qp
>>> 02b0 T ibv_close_xrc_domain
>>> 02d0 T ibv_create_xrc_rcv_qp
>>> 07a0 T ibv_create_xrc_srq
>>> 0310 T ibv_modify_xrc_rcv_qp
>>> 0280 T ibv_open_xrc_domain
>>> 0340 T ibv_query_xrc_rcv_qp
>>> 0370 T ibv_reg_xrc_rcv_qp
>>> 0390 T ibv_unreg_xrc_rcv_qp
>>>
>>>  $ grep XRC /usr/include/infiniband/verbs.h
>>> IBV_DEVICE_XRC  = 1 << 20
>>> IBV_XRC_QP_EVENT_FLAG = 0x8000,
>>> IBV_QPT_XRC,
>>> [matches in comments have been removed].
>>>
>>>  When tonight's master tarball is posted (perhaps 10 minutes from now)
>>> I will test it and report what I find.
>>>
>>>  -Paul
>>>
>>>
>>> On Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet 
>>> wrote:
>>>
  Paul,

 can you please compress and post your config.log ?
 what is the OFED version you are running ?

 on master, that fix did the trick on mellanox test cluster (recent OFED
 version) but did not
 enable XRC on lanl test clusters (my best bet is an old OFED library)

 Thanks

 Gilles


 On 7/10/2015 9:08 AM, Paul Hargrove wrote:

  Preliminary report:

 1) I find that "ConnectX XRC support" is still not detected as it was
 in 1.8.4 and earlier:

  $ grep  'ConnectX XRC support' openmpi-1.*-icc-14/LOG/configure.log|
  sort -u
   openmpi-1.8-linux-x86_64-icc-14/LOG/configure.log:checking if
 ConnectX XRC support is enabled... yes
   openmpi-1.8.1-linux-x86_64-icc-14/LOG/configure.log:checking if
 ConnectX XRC support is enabled... yes
   openmpi-1.8.2-linux-x86_64-icc-14/LOG/configure.log:checking if
 ConnectX XRC support is enabled... yes
   openmpi-1.8.3-linux-x86_64-icc-14/LOG/configure.log:checking if
 ConnectX XRC support is enabled... yes
   openmpi-1.8.4-linux-x86_64-icc-14/LOG/configure.log:checking if
 ConnectX XRC support is enabled... yes
   openmpi-1.8.5-linux-x86_64-icc-14/LOG/configure.log:checking if
 ConnectX XRC support i

[OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Paul Hargrove
Except for some slow QEMU-emulated ARM and MIPS systems my tests of
1.8.7rc1 are complete.
I am without SPARC and IA64 platforms this time around.

The only "new" (non-cosmetic) problem I observed was the failure to detect
"ConnectX XRC support".
It looks like Gilles and I iterated on that issue until we have something
that works now.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Jeff Squyres (jsquyres)
On Jul 10, 2015, at 2:12 AM, Paul Hargrove  wrote:
> 
> The only "new" (non-cosmetic) problem I observed was the failure to detect 
> "ConnectX XRC support".
> It looks like Gilles and I iterated on that issue until we have something 
> that works now.

'fraid not.  :-(

Per https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836, 
the latest commit breaks on RHEL 6.5 systems that do not have MOFED installed.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Github acting very slow

2015-07-10 Thread Jeff Squyres (jsquyres)
FYI: According to https://status.github.com/messages, as of about 10 mins ago, 
Github is investigating slowness with some git repos.

The Open MPI organization repos (the ompi repo, at least) seems to be among 
these "slow" repos -- I'm getting response times on the web UI in terms of a 
minute or two.  And I haven't seen the webhook-driven email from my latest 
commit yet (which was a few minutes ago).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Gilles Gouaillardet
Sorry about that, and thanks for reverting the commit.

Paul mentioned a patch I sent to the ml, and that worked for him.
The commit was supposed to be a more robust version.
For example, in rhel7, the deprecated function have been removed, but the
xrc domains is fine.
Currently, xrc is not supported as it should.

It seems rhel 6.5 has the deprecated function, but it is not in the header
files are missing it among other things.

I will fix that and post a issue a pr so you can test it on rhel6.5 before
I commit it.

I noticed there is no infiniband/verbs.h on a lanl test cluster (the non
cray one)
Is it possible to have it installed ?

Cheers,

Gilles

On Friday, July 10, 2015, Jeff Squyres (jsquyres) 
wrote:

> On Jul 10, 2015, at 2:12 AM, Paul Hargrove  > wrote:
> >
> > The only "new" (non-cosmetic) problem I observed was the failure to
> detect "ConnectX XRC support".
> > It looks like Gilles and I iterated on that issue until we have
> something that works now.
>
> 'fraid not.  :-(
>
> Per
> https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836,
> the latest commit breaks on RHEL 6.5 systems that do not have MOFED
> installed.
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/07/17618.php
>


Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Ralph Castain
Given that 1.8. was working correctly, why don’t we just revert the config in 
question back to the 1.8.4 version? Why was it changed in the first place? Does 
anyone know what problem someone was trying to solve?


> On Jul 10, 2015, at 7:33 AM, Gilles Gouaillardet 
>  wrote:
> 
> Sorry about that, and thanks for reverting the commit.
> 
> Paul mentioned a patch I sent to the ml, and that worked for him.
> The commit was supposed to be a more robust version.
> For example, in rhel7, the deprecated function have been removed, but the xrc 
> domains is fine.
> Currently, xrc is not supported as it should.
> 
> It seems rhel 6.5 has the deprecated function, but it is not in the header 
> files are missing it among other things.
> 
> I will fix that and post a issue a pr so you can test it on rhel6.5 before I 
> commit it.
> 
> I noticed there is no infiniband/verbs.h on a lanl test cluster (the non cray 
> one)
> Is it possible to have it installed ?
> 
> Cheers,
> 
> Gilles
> 
> On Friday, July 10, 2015, Jeff Squyres (jsquyres)  > wrote:
> On Jul 10, 2015, at 2:12 AM, Paul Hargrove  > wrote:
> >
> > The only "new" (non-cosmetic) problem I observed was the failure to detect 
> > "ConnectX XRC support".
> > It looks like Gilles and I iterated on that issue until we have something 
> > that works now.
> 
> 'fraid not.  :-(
> 
> Per https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836 
> , 
> the latest commit breaks on RHEL 6.5 systems that do not have MOFED installed.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/ 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17618.php 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17620.php



Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Gilles Gouaillardet
Ralph,

(Some) things got broken when adding support for XRC domains / OFED 3.12.
In 1.8.4 there is no XRC support with OFED 3.12
As far as I am concerned, reverting opening btl to 1.8.4 is not a
good option.

Cheers,

Gilles

On Friday, July 10, 2015, Ralph Castain  wrote:

> Given that 1.8. was working correctly, why don’t we just revert the config
> in question back to the 1.8.4 version? Why was it changed in the first
> place? Does anyone know what problem someone was trying to solve?
>
>
> On Jul 10, 2015, at 7:33 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
> Sorry about that, and thanks for reverting the commit.
>
> Paul mentioned a patch I sent to the ml, and that worked for him.
> The commit was supposed to be a more robust version.
> For example, in rhel7, the deprecated function have been removed, but the
> xrc domains is fine.
> Currently, xrc is not supported as it should.
>
> It seems rhel 6.5 has the deprecated function, but it is not in the header
> files are missing it among other things.
>
> I will fix that and post a issue a pr so you can test it on rhel6.5 before
> I commit it.
>
> I noticed there is no infiniband/verbs.h on a lanl test cluster (the non
> cray one)
> Is it possible to have it installed ?
>
> Cheers,
>
> Gilles
>
> On Friday, July 10, 2015, Jeff Squyres (jsquyres)  > wrote:
>
>> On Jul 10, 2015, at 2:12 AM, Paul Hargrove  wrote:
>> >
>> > The only "new" (non-cosmetic) problem I observed was the failure to
>> detect "ConnectX XRC support".
>> > It looks like Gilles and I iterated on that issue until we have
>> something that works now.
>>
>> 'fraid not.  :-(
>>
>> Per
>> https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836,
>> the latest commit breaks on RHEL 6.5 systems that do not have MOFED
>> installed.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/07/17618.php
>>
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/07/17620.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/07/17621.php
>


Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Jeff Squyres (jsquyres)
Yes, I seem to recall that this issue came up before... ah, here it is:

commit 04bec4475e5a962432b73dd6254f62bb263703ab
Author: Jeff Squyres 
List-Post: devel@lists.open-mpi.org
Date:   Fri Jan 16 18:13:31 2015 -0800

openib: check more thoroughly for XRC

Some systems have XRC symbols in their libibverbs libraries, but do
not have the appropriate XRC bits in their devel headers (cough cough
RHEL 6.5 libibverbs-rocee-*.x86-64.rpm cough cough).

So expand the XRC config checks to ensure that we can actually find
one of the XRC constants that we need to compile XRC code before
ruling that we can actually build XRC support.



> On Jul 10, 2015, at 10:33 AM, Gilles Gouaillardet 
>  wrote:
> 
> Sorry about that, and thanks for reverting the commit.
> 
> Paul mentioned a patch I sent to the ml, and that worked for him.
> The commit was supposed to be a more robust version.
> For example, in rhel7, the deprecated function have been removed, but the xrc 
> domains is fine.
> Currently, xrc is not supported as it should.
> 
> It seems rhel 6.5 has the deprecated function, but it is not in the header 
> files are missing it among other things.
> 
> I will fix that and post a issue a pr so you can test it on rhel6.5 before I 
> commit it.
> 
> I noticed there is no infiniband/verbs.h on a lanl test cluster (the non cray 
> one)
> Is it possible to have it installed ?
> 
> Cheers,
> 
> Gilles
> 
> On Friday, July 10, 2015, Jeff Squyres (jsquyres)  wrote:
> On Jul 10, 2015, at 2:12 AM, Paul Hargrove  wrote:
> >
> > The only "new" (non-cosmetic) problem I observed was the failure to detect 
> > "ConnectX XRC support".
> > It looks like Gilles and I iterated on that issue until we have something 
> > that works now.
> 
> 'fraid not.  :-(
> 
> Per https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836, 
> the latest commit breaks on RHEL 6.5 systems that do not have MOFED installed.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17618.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17620.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Open MPI 1.8.6 memory leak

2015-07-10 Thread Nick Papior
Just want to confirm.

I also see this memory leak on 1.8.6, using 1.8.7rc1 fixes this memory leak.


2015-07-02 4:02 GMT+02:00 Gilles Gouaillardet :

>  Nathan,
>
> the root cause is your fixes were not backported to the v1.8 (nor the
> v1.10) branch
>
> i made PR https://github.com/open-mpi/ompi-release/pull/357 to fix this.
>
> could you please review it ?
>
> since there are quite a lot of differences between v1.8 and master, the
> backport was not trivial.
> i left some #if 0 in the code since i do not know if something need to be
> done about rdma fragments
>
> Cheers,
>
> Gilles
>
>
> On 7/2/2015 6:04 AM, Nathan Hjelm wrote:
>
> Don't see the leak on master with OS X using the leaks command. Will see
> what valgrind finds on linux.
>
> -Nathan
>
> On Wed, Jul 01, 2015 at 08:48:57PM +, Rolf vandeVaart wrote:
>
> There have been two reports on the user list about memory leaks.  I have
>reproduced this leak with LAMMPS.  Note that this has nothing to do with
>CUDA-aware features.  The steps that Stefan has provided make it easy to
>reproduce.
>
>
>
>Here are some more specific steps to reproduce derived from Stefan.
>
>
>
>1. clone LAMMPS (git clone git://git.lammps.org/lammps-ro.git lammps)
>2. cd src/, compile with openMPI 1.8.6.  To do this, set your path to Open
>MPI and type "make mpi"
>3. run the example listed in lammps/examples/melt. To do this, first copy
>"lmp_mpi" from the src directory into the melt directory.  Then you need
>to modify the in.melt file so that it will run for a while.  Change
>"run 25" to "run25"
>
>4. you can run by mpirun -np 2 lmp_mpi < in.melt
>
>
>
>For reference, here is both 1.8.5 and 1.8.6 memory consumption.  1.8.5
>stays very stable where 1.8.6 almost triples after 6 minutes of running.
>
>
>
>Open MPI 1.8.5
>
>
>
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 59.0  0.0 329672 14584 pts/16   Rl   16:24   0:00
>./lmp_mpi_185_nocuda
>3234126908 60.0  0.0 329672 14676 pts/16   Rl   16:24   0:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 98.3  0.0 329672 14932 pts/16   Rl   16:24   0:30
>./lmp_mpi_185_nocuda
>3234126908 98.5  0.0 329672 14932 pts/16   Rl   16:24   0:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 98.9  0.0 329672 14960 pts/16   Rl   16:24   1:00
>./lmp_mpi_185_nocuda
>3234126908 99.1  0.0 329672 14952 pts/16   Rl   16:24   1:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.1  0.0 329672 14960 pts/16   Rl   16:24   1:30
>./lmp_mpi_185_nocuda
>3234126908 99.3  0.0 329672 14952 pts/16   Rl   16:24   1:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.2  0.0 329672 14960 pts/16   Rl   16:24   2:00
>./lmp_mpi_185_nocuda
>3234126908 99.4  0.0 329672 14952 pts/16   Rl   16:24   2:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.3  0.0 329672 14960 pts/16   Rl   16:24   2:30
>./lmp_mpi_185_nocuda
>3234126908 99.5  0.0 329672 14952 pts/16   Rl   16:24   2:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   2:59
>./lmp_mpi_185_nocuda
>3234126908 99.5  0.0 329672 14952 pts/16   Rl   16:24   3:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:29
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   3:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:59
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   4:29
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   4:59
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:29
>./lmp_mpi

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Paul Hargrove
The timing on this is less than ideal for me.

To accommodate work on some high-voltage switching equipment, our building
will be without power over the weekend.
The system I use to autogen will be OFF from around 3pm today until perhaps
3pm on Monday.
I will also be busy with shutting down our group's systems gracefully today
and bringing them back on Monday.

The test platforms where I have reproduced the failures is NOT going to be
off-line.
So, I will be able to test only *tarballs* (but not patches to .m4 files)
until probably Monday evening.

Gilles,

I think it reasonable to suspect the lib could hold a stub that returns
ENOSYS for the deprecated function.
I suspect that checking for ibv_create_xrc_rcv_qp+IBV_QPT_XRC should work
for the rhel6.5 failure case described previously.
That way the checks for the two flavors both look for a function in the lib
and a constant in the header.


-Paul

On Fri, Jul 10, 2015 at 8:21 AM, Jeff Squyres (jsquyres)  wrote:

> Yes, I seem to recall that this issue came up before... ah, here it is:
>
> commit 04bec4475e5a962432b73dd6254f62bb263703ab
> Author: Jeff Squyres 
> Date:   Fri Jan 16 18:13:31 2015 -0800
>
> openib: check more thoroughly for XRC
>
> Some systems have XRC symbols in their libibverbs libraries, but do
> not have the appropriate XRC bits in their devel headers (cough cough
> RHEL 6.5 libibverbs-rocee-*.x86-64.rpm cough cough).
>
> So expand the XRC config checks to ensure that we can actually find
> one of the XRC constants that we need to compile XRC code before
> ruling that we can actually build XRC support.
>
>
>
> > On Jul 10, 2015, at 10:33 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
> >
> > Sorry about that, and thanks for reverting the commit.
> >
> > Paul mentioned a patch I sent to the ml, and that worked for him.
> > The commit was supposed to be a more robust version.
> > For example, in rhel7, the deprecated function have been removed, but
> the xrc domains is fine.
> > Currently, xrc is not supported as it should.
> >
> > It seems rhel 6.5 has the deprecated function, but it is not in the
> header files are missing it among other things.
> >
> > I will fix that and post a issue a pr so you can test it on rhel6.5
> before I commit it.
> >
> > I noticed there is no infiniband/verbs.h on a lanl test cluster (the non
> cray one)
> > Is it possible to have it installed ?
> >
> > Cheers,
> >
> > Gilles
> >
> > On Friday, July 10, 2015, Jeff Squyres (jsquyres) 
> wrote:
> > On Jul 10, 2015, at 2:12 AM, Paul Hargrove  wrote:
> > >
> > > The only "new" (non-cosmetic) problem I observed was the failure to
> detect "ConnectX XRC support".
> > > It looks like Gilles and I iterated on that issue until we have
> something that works now.
> >
> > 'fraid not.  :-(
> >
> > Per
> https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836,
> the latest commit breaks on RHEL 6.5 systems that do not have MOFED
> installed.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/07/17618.php
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/07/17620.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/07/17623.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Paul Hargrove
Update:

I have updated the autotools on my laptop to the point that I can autogen
now.
So, if necessary, I can once again test patches to the .m4 files (rather
than needing full tarballs).

-Paul

On Fri, Jul 10, 2015 at 12:22 PM, Paul Hargrove  wrote:

> The timing on this is less than ideal for me.
>
> To accommodate work on some high-voltage switching equipment, our building
> will be without power over the weekend.
> The system I use to autogen will be OFF from around 3pm today until
> perhaps 3pm on Monday.
> I will also be busy with shutting down our group's systems gracefully
> today and bringing them back on Monday.
>
> The test platforms where I have reproduced the failures is NOT going to be
> off-line.
> So, I will be able to test only *tarballs* (but not patches to .m4 files)
> until probably Monday evening.
>
> Gilles,
>
> I think it reasonable to suspect the lib could hold a stub that returns
> ENOSYS for the deprecated function.
> I suspect that checking for ibv_create_xrc_rcv_qp+IBV_QPT_XRC should work
> for the rhel6.5 failure case described previously.
> That way the checks for the two flavors both look for a function in the
> lib and a constant in the header.
>
>
> -Paul
>
> On Fri, Jul 10, 2015 at 8:21 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Yes, I seem to recall that this issue came up before... ah, here it is:
>>
>> commit 04bec4475e5a962432b73dd6254f62bb263703ab
>> Author: Jeff Squyres 
>> Date:   Fri Jan 16 18:13:31 2015 -0800
>>
>> openib: check more thoroughly for XRC
>>
>> Some systems have XRC symbols in their libibverbs libraries, but do
>> not have the appropriate XRC bits in their devel headers (cough cough
>> RHEL 6.5 libibverbs-rocee-*.x86-64.rpm cough cough).
>>
>> So expand the XRC config checks to ensure that we can actually find
>> one of the XRC constants that we need to compile XRC code before
>> ruling that we can actually build XRC support.
>>
>>
>>
>> > On Jul 10, 2015, at 10:33 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>> >
>> > Sorry about that, and thanks for reverting the commit.
>> >
>> > Paul mentioned a patch I sent to the ml, and that worked for him.
>> > The commit was supposed to be a more robust version.
>> > For example, in rhel7, the deprecated function have been removed, but
>> the xrc domains is fine.
>> > Currently, xrc is not supported as it should.
>> >
>> > It seems rhel 6.5 has the deprecated function, but it is not in the
>> header files are missing it among other things.
>> >
>> > I will fix that and post a issue a pr so you can test it on rhel6.5
>> before I commit it.
>> >
>> > I noticed there is no infiniband/verbs.h on a lanl test cluster (the
>> non cray one)
>> > Is it possible to have it installed ?
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Friday, July 10, 2015, Jeff Squyres (jsquyres) 
>> wrote:
>> > On Jul 10, 2015, at 2:12 AM, Paul Hargrove  wrote:
>> > >
>> > > The only "new" (non-cosmetic) problem I observed was the failure to
>> detect "ConnectX XRC support".
>> > > It looks like Gilles and I iterated on that issue until we have
>> something that works now.
>> >
>> > 'fraid not.  :-(
>> >
>> > Per
>> https://github.com/open-mpi/ompi-release/pull/384#issuecomment-120412836,
>> the latest commit breaks on RHEL 6.5 systems that do not have MOFED
>> installed.
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com
>> > For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/07/17618.php
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/07/17620.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/07/17623.php
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] mpiexec without -hosts option

2015-07-10 Thread Jeff Squyres (jsquyres)
Victor --

I'm sorry that this message slipped by.

Did you figure out that you were running MPICH, and not Open MPI?  (I say that 
because I see the executable "mpichversion" in your output.


> On Jun 30, 2015, at 10:09 PM, Victor Rodriguez  wrote:
> 
> HI team
> ( if this is not the proper channel please point me )
> 
> I am trying to implement MPI on yocto ( first try ) :
> 
> http://cgit.openembedded.org/meta-openembedded/commit/meta-oe/recipes-devtools/mpich/mpich_3.1.1.bb?id=824b6de96ddfe791a0013d96a84ad49de8e04d38
> 
> I tested running the mpibench
> http://icl.cs.utk.edu/projects/llcbench/mpbench.html
> 
> On a minnow board MAX and work just amazing :)
> 
> The problem is that I did the test with just one board, when I tried
> to implement it for more than one platform I decided to use the -hosts
> option ( or the -H hostfile ) but the only option that I have :
> 
> root@qemux86:~# mpiexec --version
> invalid mpiexec argument --version
> Usage: mpiexec -usize  -maxtime  -exitinfo -l\
>   -n  -soft  -host  \
>   -wdir  -path  \
>   -file  -configfile  \
>   -genvnone -genvlist  -genv name value\
>   -envnone -envlist  -env name value\
>   execname \
>   [ : -n  ... execname ]
> 
> I should have realize something was wrong because there was no mpirun
> I just have :
> 
> root@qemux86:~# mpi
> mpic++mpichversion  mpiexec
> mpicc mpicxxmpivars
> 
> My configuration is something like :
> 
> "--disable-fortran \
> --disable-rpath \
> --with-pm=gforker"
> 
> The first one is just because I dont want fortran and the last 2
> because of QA problems with Yocto
> 
> I wonder if any part of my configurations is wrong or what part I am
> doing wrong :( . Cause I really need to run on multiple systems :(
> 
> Best regards
> 
> Victor Rodriguez
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/06/17572.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/