Paul,

the openib btl need to be reworked ...

currently, we have
#define HAVE_XRC (1 == OMPI_HAVE_CONNECTX_XRC)
/* and it is impossible to have !OMPI_HAVE_CONNECTX_XRC && OMPI_HAVE_CONNECTX_XRC_DOMAINS */

but with your patch, this becomes possible and HAVE_XRC would be zero, which is incorrect

i will fix that too

Cheers,

Gilles


On 7/10/2015 1:16 PM, Paul Hargrove wrote:
Gilles,

I've made another observation about what I believe is an error in the XRC configure probe.

If I am following the code below correctly, then *both* ConnectX and ConnectIB depend on ibv_create_xrc_rcv_qp being defined. However, that function is marked as deprecated (presumably in favor of ibv_cmd_open_xrcd). So, when a later revision of libibverbs removes the deprecated function, *neither* the old or new interface will be detected as supported!

           # ibv_create_xrc_rcv_qp was added in OFED 1.3
           # ibv_cmd_open_xrcd (aka XRC Domains) was added in  OFED 3.12
           if test "$enable_connectx_xrc" = "yes"; then
 $1_have_xrc=1
 AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
      [], [$1_have_xrc=0])
 AC_CHECK_DECLS([IBV_SRQT_XRC],
      [], [$1_have_xrc=0],
      [#include <infiniband/verbs.h>])
           fi
           if test "$enable_connectx_xrc" = "yes" \
 && test $$1_have_xrc -eq 1; then
 AC_CHECK_FUNCS([ibv_cmd_open_xrcd], [$1_have_xrc_domains=1])
           fi

While I am not certain if a probe for IBV_SRQT_XRC is really necessary, my suggested replacement for the logic above is:

           # ibv_create_xrc_rcv_qp was added in OFED 1.3
           # ibv_cmd_open_xrcd (aka XRC Domains) was added in  OFED 3.12
           if test "$enable_connectx_xrc" = "yes"; then
 $1_have_xrc=1
 AC_CHECK_FUNCS([ibv_cmd_open_xrcd],
      [AC_CHECK_DECLS([IBV_SRQT_XRC],
                      [$1_have_xrc_domains=1],
                      [$1_have_xrc=0],
                      [#include <infiniband/verbs.h>])])
 AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp],
      [$1_have_xrc=1])
           fi

In summary
  $1_have_xrc_domains = HAVE_IBV_CMD_OPEN_XRCD && HAVE_IBV_SRQT_XRC
  $1_have_xrc = $1_have_xrc_domains || HAVE_IBV_CREATE_XRC_RCV_QP

This worked as expected on old (only ConnectX XRC support) and new (both ConnectX and ConnectIB XRC support) in my testing.
-Paul

On Thu, Jul 9, 2015 at 7:06 PM, Paul Hargrove <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:

    Gilles,

    The patch didn't apply to the 1.8.7rc1 tarball.
    So, I made the change manually and ran autogen.pl <http://autogen.pl>.

    The result is that one fewer configure test runs, but "ConnectX
    XRC support" is still disabled:

    Diffing the configure output:
     checking for ibv_resize_cq... yes
     checking for struct ibv_device.transport_type... yes
     checking for ibv_create_xrc_rcv_qp... yes
    -checking for ibv_cmd_open_xrcd... no
     checking whether IBV_SRQT_XRC is declared... no
     checking infiniband/complib/cl_types_osd.h usability... no
     checking infiniband/complib/cl_types_osd.h presence... no


    You will note that "IBV_SRQT_XRC" did not appear when I grepped
    for XRC in /usr/include/infiniband/verbs.h (in a previous message).
    I am not sure, but suspect that identifier is related to
    "ConnectIB XRC support" (not ConnectX).
    If you look back at the 1.8.4 release you will find only a check
    for ibv_create_xrc_rcv_qp.

    -Paul

    On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet
    <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

        Thanks Paul,

        i just found an other bug ...
        (and i should be blamed for it)

        here is attached a patch.

        basically, xrc was incorrectly disabled on "older" ofed stacks

        Cheers,

        Gilles



        On 7/10/2015 10:06 AM, Paul Hargrove wrote:
        Gilles,

        A bzip2-compressed config.log is attached.

        I am unsure how to determine the OFED version, because the
        admins have prevented normal users from reading the RPM database.
        Perhaps the following helps:

        $ nm /usr/lib64/libibverbs.a | grep -i xrc
        00000000000000e0 T ibv_cmd_close_xrc_domain
        0000000000000230 T ibv_cmd_create_xrc_rcv_qp
        00000000000003b0 T ibv_cmd_create_xrc_srq
        0000000000000a40 T ibv_cmd_modify_xrc_rcv_qp
        0000000000000150 T ibv_cmd_open_xrc_domain
        0000000000001e30 T ibv_cmd_query_xrc_rcv_qp
        0000000000000070 T ibv_cmd_reg_xrc_rcv_qp
        0000000000000000 T ibv_cmd_unreg_xrc_rcv_qp
        00000000000002b0 T ibv_close_xrc_domain
        00000000000002d0 T ibv_create_xrc_rcv_qp
        00000000000007a0 T ibv_create_xrc_srq
        0000000000000310 T ibv_modify_xrc_rcv_qp
        0000000000000280 T ibv_open_xrc_domain
        0000000000000340 T ibv_query_xrc_rcv_qp
        0000000000000370 T ibv_reg_xrc_rcv_qp
        0000000000000390 T ibv_unreg_xrc_rcv_qp

        $ grep XRC /usr/include/infiniband/verbs.h
                IBV_DEVICE_XRC        = 1 << 20
                IBV_XRC_QP_EVENT_FLAG = 0x80000000,
                IBV_QPT_XRC,
        [matches in comments have been removed].

        When tonight's master tarball is posted (perhaps 10 minutes
        from now) I will test it and report what I find.

        -Paul


        On Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet
        <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

            Paul,

            can you please compress and post your config.log ?
            what is the OFED version you are running ?

            on master, that fix did the trick on mellanox test
            cluster (recent OFED version) but did not
            enable XRC on lanl test clusters (my best bet is an old
            OFED library)

            Thanks

            Gilles


            On 7/10/2015 9:08 AM, Paul Hargrove wrote:
            Preliminary report:

            1) I find that "ConnectX XRC support" is still not
            detected as it was in 1.8.4 and earlier:

                $ grep  'ConnectX XRC support'
                openmpi-1.*-icc-14/LOG/configure.log|  sort -u
                openmpi-1.8-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... yes
                openmpi-1.8.1-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... yes
                openmpi-1.8.2-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... yes
                openmpi-1.8.3-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... yes
                openmpi-1.8.4-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... yes
                openmpi-1.8.5-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... no
                openmpi-1.8.6-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... no
                openmpi-1.8.7rc1-linux-x86_64-icc-14/LOG/configure.log:checking
                if ConnectX XRC support is enabled... no



            2) I noticed a cosmetic "glitch" in the configure output:

                checking for working epoll library interface... checking if 
epoll can build... yes

                yes

            This just means AC_MSG_{CHECKING,RESULT} macros are
            nested when they shouldn't be.
            There is nothing to suggest that the results of the
            configure probes are incorrect.


            -Paul

            On Thu, Jul 9, 2015 at 1:03 PM, Ralph Castain
            <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

                In the usual place:

                http://www.open-mpi.org/software/ompi/v1.8/

                Please test and let me know of any issues that
                surface. My intent is to release this next week.
                Ralph


                _______________________________________________
                devel mailing list
                de...@open-mpi.org <mailto:de...@open-mpi.org>
                Subscription:
                http://www.open-mpi.org/mailman/listinfo.cgi/devel
                Link to this post:
                http://www.open-mpi.org/community/lists/devel/2015/07/17604.php




-- Paul H. Hargrove phhargr...@lbl.gov
            <mailto:phhargr...@lbl.gov>
            Computer Languages & Systems Software (CLaSS) Group
            Computer Science Department         Tel: +1-510-495-2352
            <tel:%2B1-510-495-2352>
            Lawrence Berkeley National Laboratory Fax:
            +1-510-486-6900 <tel:%2B1-510-486-6900>


            _______________________________________________ devel
            mailing list de...@open-mpi.org
            <mailto:de...@open-mpi.org> Subscription:
            http://www.open-mpi.org/mailman/listinfo.cgi/devel

            Link to this 
post:http://www.open-mpi.org/community/lists/devel/2015/07/17606.php


            _______________________________________________
            devel mailing list
            de...@open-mpi.org <mailto:de...@open-mpi.org>
            Subscription:
            http://www.open-mpi.org/mailman/listinfo.cgi/devel
            Link to this post:
            http://www.open-mpi.org/community/lists/devel/2015/07/17607.php




-- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
        Computer Languages & Systems Software (CLaSS) Group
        Computer Science Department               Tel:
        +1-510-495-2352 <tel:%2B1-510-495-2352>
        Lawrence Berkeley National Laboratory     Fax:
        +1-510-486-6900 <tel:%2B1-510-486-6900>


        _______________________________________________ devel mailing
        list de...@open-mpi.org <mailto:de...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel

        Link to this 
post:http://www.open-mpi.org/community/lists/devel/2015/07/17608.php


        _______________________________________________
        devel mailing list
        de...@open-mpi.org <mailto:de...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
        Link to this post:
        http://www.open-mpi.org/community/lists/devel/2015/07/17609.php




-- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
    Computer Languages & Systems Software (CLaSS) Group
    Computer Science Department Tel: +1-510-495-2352
    <tel:%2B1-510-495-2352>
    Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
    <tel:%2B1-510-486-6900>




--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/07/17611.php

Reply via email to