FYI, my testing has been with only the map_on_demand=16 setting, and all other 
modparams default. Also, I haven't run servers on MOFED at all, just kernel IB. 
And last, my last build was earlier than 2.11.54 so perhaps something new is 
going on.

ruth


On 9/4/18, 10:12 AM, "lustre-discuss on behalf of 
lustre-discuss-requ...@lists.lustre.org" 
<lustre-discuss-boun...@lists.lustre.org on behalf of 
lustre-discuss-requ...@lists.lustre.org> wrote:

    Send lustre-discuss mailing list submissions to
        lustre-discuss@lists.lustre.org
    
    To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    or, via email, send a message with subject or body 'help' to
        lustre-discuss-requ...@lists.lustre.org
    
    You can reach the person managing the list at
        lustre-discuss-ow...@lists.lustre.org
    
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of lustre-discuss digest..."
    
    
    Today's Topics:
    
       1. lustre client not able to lctl ping or mount (Pak Lui)
       2. Re: lustre client not able to lctl ping or mount (Richard Henwood)
       3. Re: lustre client not able to lctl ping or mount (Pak Lui)
    
    
    ----------------------------------------------------------------------
    
    Message: 1
    Date: Tue, 4 Sep 2018 08:06:09 -0700
    From: Pak Lui <pak....@linaro.org>
    To: lustre-discuss@lists.lustre.org
    Subject: [lustre-discuss] lustre client not able to lctl ping or mount
    Message-ID:
        <CAMScT+X7cxqJETiifWfJ_8LLwenypg=kkb1unyzxpartvva...@mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    Hi all,
    
    I am having issue with the Lustre client pinging the server using o2ib.I
    want to find out if anyone has a suggestion on what could be the problem.
    Thanks in advance.
    
    lustre client pinging to server:
    
    [root@n0 ~]# lctl ping 192.168.13.8@o2ib
    failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<<
    
    lustre client pinging to server over IPoIB works:
    
    [root@n0~]# ping -c 1 192.168.13.8
    PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
    64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
    
    
    lustre client pinging to self or other client works:
    
    [root@n0 ~]# lctl ping 192.168.13.54@o2ib
    12345-0@lo
    12345-192.168.13.54@o2ib
    
    lustre client pinging to self or otover IPoIB works:
    
    [root@n0~]# ping -c 1 192.168.13.54
    PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
    64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
    
    
    The lustre server and client have specified the modprobe for lnet:
    
    /etc/modprobe.conf
    options lnet networks=o2ib(ib0)
    
    
    The client reports some error when trying to ping or mount from the client
    to server:
    modprobe lustre lnet
    lctl ping 192.168.13.8@o2ib
    mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs
    
    [root@n0 ~]# dmesg|tail
    [589805.093447] Lustre: Lustre: Build Version: 2.11.54
    [589805.272652] LNet: Using FastReg for registration
    [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180]
    [589813.278370] LNet: 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns())
    Timed out tx for 192.168.13.186@o2ib: 589813 seconds
    [589835.518404] LustreError:
    22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2ib:
    failed processing log, type 1: rc = -5
    [589843.118385] LustreError: 22488:0:(mgc_request.c:601:do_requeue())
    failed processing log: -5
    [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The configuration
    from log 'zfs-client' failed (-5). This may be the result of communication
    errors between this node and the MGS, a bad configuration, or other errors.
    See the syslog for more information.
    [589866.741623] Lustre: Unmounted zfs-client
    [589867.278516] LustreError: 22463:0:(obd_mount.c:1599:lustre_fill_super())
    Unable to mount  (-5)
    
    
    server reports some error during mounting:
    
    [root@license ~]# Sep  4 07:26:56 license kernel: LNet:
    25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept conn from
    192.168.13.54@o2ib (version 12): max_frags 16 incompatible without FMR pool
    (256 wanted)
    
    
    The lustre server setup:
    
    [root@license ~]# lfs df -h
    UUID                       bytes        Used   Available Use% Mounted on
    zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
    /mnt/zfs[MDT:0]
    zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
    /mnt/zfs[OST:0]
    
    filesystem_summary:         1.7T       10.0G        1.7T   1% /mnt/zfs
    
    
    server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-2.0.7.0,
    lustre 2.11.54
    client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-2.0.7.0 ,
    lustre 2.11.54
    
    Regards,
    - Pak
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: 
<http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/08a17f0d/attachment-0001.html>
    
    ------------------------------
    
    Message: 2
    Date: Tue, 4 Sep 2018 16:00:19 +0000
    From: Richard Henwood <richard.henw...@arm.com>
    To: "lustre-discuss@lists.lustre.org"
        <lustre-discuss@lists.lustre.org>, "pak....@linaro.org"
        <pak....@linaro.org>
    Subject: Re: [lustre-discuss] lustre client not able to lctl ping or
        mount
    Message-ID: <5f920989941b1007874e988bf748eb1a84a38068.ca...@arm.com>
    Content-Type: text/plain; charset="utf-8"
    
    On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
    > Hi all,
    >
    > I am having issue with the Lustre client pinging the server using
    > o2ib.I want to find out if anyone has a suggestion on what could be
    > the problem. Thanks in advance.
    >
    > lustre client pinging to server:
    > > [root@n0 ~]# lctl ping 192.168.13.8@o2ib
    > > failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<<
    >
    > lustre client pinging to server over IPoIB works:
    > > [root@n0~]# ping -c 1 192.168.13.8
    > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
    > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
    >
    > lustre client pinging to self or other client works:
    > > [root@n0 ~]# lctl ping 192.168.13.54@o2ib
    > > 12345-0@lo
    > > 12345-192.168.13.54@o2ib
    >
    > lustre client pinging to self or otover IPoIB works:
    > > [root@n0~]# ping -c 1 192.168.13.54
    > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
    > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
    >
    > The lustre server and client have specified the modprobe for lnet:
    > > /etc/modprobe.conf
    > > options lnet networks=o2ib(ib0)
    >
    > The client reports some error when trying to ping or mount from the
    > client to server:
    > modprobe lustre lnet
    > lctl ping 192.168.13.8@o2ib
    > mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs
    >
    > > [root@n0 ~]# dmesg|tail
    > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
    > > [589805.272652] LNet: Using FastReg for registration
    > > [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180]
    > > [589813.278370] LNet:
    > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
    > > 92.168.13.186@o2ib: 589813 seconds
    > > [589835.518404] LustreError:
    > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2i
    > > b: failed processing log, type 1: rc = -5
    > > [589843.118385] LustreError:
    > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
    > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The
    > > configuration from log 'zfs-client' failed (-5). This may be the
    > > result of communication errors between this node and the MGS, a bad
    > > configuration, or other errors. See the syslog for more
    > > information.
    > > [589866.741623] Lustre: Unmounted zfs-client
    > > [589867.278516] LustreError:
    > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-
    > > 5)
    >
    > server reports some error during mounting:
    > > [root@license ~]# Sep  4 07:26:56 license kernel: LNet:
    > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
    > > conn from 192.168.13.54@o2ib (version 12): max_frags 16
    > > incompatible without FMR pool (256 wanted)
    >
    > The lustre server setup:
    > > [root@license ~]# lfs df -h
    > > UUID                       bytes        Used   Available Use%
    > > Mounted on
    > > zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
    > > /mnt/zfs[MDT:0]
    > > zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
    > > /mnt/zfs[OST:0]
    > >
    > > filesystem_summary:         1.7T       10.0G        1.7T   1%
    > > /mnt/zfs
    >
    > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0, lustre 2.11.54
    > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0 , lustre 2.11.54
    >
    
    
    It might be helpful to state the Lustre software versions that you have
    used.
    
    Also, given this is an Arm client with (with presumably 64K pg size),
    connecting to a x86 server (with presumably 4K pg size), have you added
    the map_on_demand=16 incantation to the server? I don't have direct
    experience of this, but heard it was needed in some Arm configurations
    (depending on server/client version):
    
    https://jira.whamcloud.com/browse/LU-10775
    
    May be James can advise?
    
    best regards,
    Richard
    
    --
    richard.henw...@arm.com
    Server Software Eco-System
    Tel: +1 512 410 9612
    IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
    
    ------------------------------
    
    Message: 3
    Date: Tue, 4 Sep 2018 09:12:03 -0700
    From: Pak Lui <pak....@linaro.org>
    To: Richard Henwood <richard.henw...@arm.com>
    Cc: "lustre-discuss@lists.lustre.org"
        <lustre-discuss@lists.lustre.org>
    Subject: Re: [lustre-discuss] lustre client not able to lctl ping or
        mount
    Message-ID:
        <camsct+wpamcuthczipocxkqouksowprl8n928lwrr9f45xm...@mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    Richard, James,
    
    I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that
    was suggested. Also tried "map_on_demand=0" as suggested here:
    http://wiki.lustre.org/Optimizing_o2iblnd_Performance
    
    /etc/modprobe.d/ko2iblnd.conf
    
    alias ko2iblnd-opa ko2iblnd
    # tried, as suggested in
    http://wiki.lustre.org/Optimizing_o2iblnd_Performance
    #options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
    ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512
    fmr_cache=1 conns_per_peer=4
    options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
    ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512
    fmr_cache=1 conns_per_peer=4
    install ko2iblnd /usr/sbin/ko2iblnd-probe
    
    
    As for the Lustre software versions that I am using:
    
    > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0, lustre 2.11.54
    > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
    > 2.0.7.0 , lustre 2.11.54
    
    As for the IB hardware, it is Mellanox ConnectX-5 Socket Direct. Only 1
    IPoIB for mlx5_0 (for the ib0 interface) is configured.
    
    Thanks,
    - Pak
    
    On Tue, Sep 4, 2018 at 9:00 AM, Richard Henwood <richard.henw...@arm.com>
    wrote:
    
    > On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote:
    > > Hi all,
    > >
    > > I am having issue with the Lustre client pinging the server using
    > > o2ib.I want to find out if anyone has a suggestion on what could be
    > > the problem. Thanks in advance.
    > >
    > > lustre client pinging to server:
    > > > [root@n0 ~]# lctl ping 192.168.13.8@o2ib
    > > > failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<<
    > >
    > > lustre client pinging to server over IPoIB works:
    > > > [root@n0~]# ping -c 1 192.168.13.8
    > > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data.
    > > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms
    > >
    > > lustre client pinging to self or other client works:
    > > > [root@n0 ~]# lctl ping 192.168.13.54@o2ib
    > > > 12345-0@lo
    > > > 12345-192.168.13.54@o2ib
    > >
    > > lustre client pinging to self or otover IPoIB works:
    > > > [root@n0~]# ping -c 1 192.168.13.54
    > > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data.
    > > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms
    > >
    > > The lustre server and client have specified the modprobe for lnet:
    > > > /etc/modprobe.conf
    > > > options lnet networks=o2ib(ib0)
    > >
    > > The client reports some error when trying to ping or mount from the
    > > client to server:
    > > modprobe lustre lnet
    > > lctl ping 192.168.13.8@o2ib
    > > mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs
    > >
    > > > [root@n0 ~]# dmesg|tail
    > > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54
    > > > [589805.272652] LNet: Using FastReg for registration
    > > > [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180]
    > > > [589813.278370] LNet:
    > > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1
    > > > 92.168.13.186@o2ib: 589813 seconds
    > > > [589835.518404] LustreError:
    > > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2i
    > > > b: failed processing log, type 1: rc = -5
    > > > [589843.118385] LustreError:
    > > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5
    > > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The
    > > > configuration from log 'zfs-client' failed (-5). This may be the
    > > > result of communication errors between this node and the MGS, a bad
    > > > configuration, or other errors. See the syslog for more
    > > > information.
    > > > [589866.741623] Lustre: Unmounted zfs-client
    > > > [589867.278516] LustreError:
    > > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-
    > > > 5)
    > >
    > > server reports some error during mounting:
    > > > [root@license ~]# Sep  4 07:26:56 license kernel: LNet:
    > > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept
    > > > conn from 192.168.13.54@o2ib (version 12): max_frags 16
    > > > incompatible without FMR pool (256 wanted)
    > >
    > > The lustre server setup:
    > > > [root@license ~]# lfs df -h
    > > > UUID                       bytes        Used   Available Use%
    > > > Mounted on
    > > > zfs-MDT0000_UUID          863.4M        7.5M      853.9M   1%
    > > > /mnt/zfs[MDT:0]
    > > > zfs-OST0000_UUID            1.7T       10.0G        1.7T   1%
    > > > /mnt/zfs[OST:0]
    > > >
    > > > filesystem_summary:         1.7T       10.0G        1.7T   1%
    > > > /mnt/zfs
    > >
    > > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-
    > > 2.0.7.0, lustre 2.11.54
    > > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-
    > > 2.0.7.0 , lustre 2.11.54
    > >
    >
    >
    > It might be helpful to state the Lustre software versions that you have
    > used.
    >
    > Also, given this is an Arm client with (with presumably 64K pg size),
    > connecting to a x86 server (with presumably 4K pg size), have you added
    > the map_on_demand=16 incantation to the server? I don't have direct
    > experience of this, but heard it was needed in some Arm configurations
    > (depending on server/client version):
    >
    > https://jira.whamcloud.com/browse/LU-10775
    >
    > May be James can advise?
    >
    > best regards,
    > Richard
    >
    > --
    > richard.henw...@arm.com
    > Server Software Eco-System
    > Tel: +1 512 410 9612
    > IMPORTANT NOTICE: The contents of this email and any attachments are
    > confidential and may also be privileged. If you are not the intended
    > recipient, please notify the sender immediately and do not disclose the
    > contents to any other person, use it for any purpose, or store or copy the
    > information in any medium. Thank you.
    >
    
    
    
    -- 
    Regards,
    - Pak
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: 
<http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/1a52688d/attachment.html>
    
    ------------------------------
    
    Subject: Digest Footer
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss@lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    
    
    ------------------------------
    
    End of lustre-discuss Digest, Vol 150, Issue 3
    **********************************************
    

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to