FYI, my testing has been with only the map_on_demand=16 setting, and all other modparams default. Also, I haven't run servers on MOFED at all, just kernel IB. And last, my last build was earlier than 2.11.54 so perhaps something new is going on.
ruth On 9/4/18, 10:12 AM, "lustre-discuss on behalf of lustre-discuss-requ...@lists.lustre.org" <lustre-discuss-boun...@lists.lustre.org on behalf of lustre-discuss-requ...@lists.lustre.org> wrote: Send lustre-discuss mailing list submissions to lustre-discuss@lists.lustre.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org or, via email, send a message with subject or body 'help' to lustre-discuss-requ...@lists.lustre.org You can reach the person managing the list at lustre-discuss-ow...@lists.lustre.org When replying, please edit your Subject line so it is more specific than "Re: Contents of lustre-discuss digest..." Today's Topics: 1. lustre client not able to lctl ping or mount (Pak Lui) 2. Re: lustre client not able to lctl ping or mount (Richard Henwood) 3. Re: lustre client not able to lctl ping or mount (Pak Lui) ---------------------------------------------------------------------- Message: 1 Date: Tue, 4 Sep 2018 08:06:09 -0700 From: Pak Lui <pak....@linaro.org> To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] lustre client not able to lctl ping or mount Message-ID: <CAMScT+X7cxqJETiifWfJ_8LLwenypg=kkb1unyzxpartvva...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi all, I am having issue with the Lustre client pinging the server using o2ib.I want to find out if anyone has a suggestion on what could be the problem. Thanks in advance. lustre client pinging to server: [root@n0 ~]# lctl ping 192.168.13.8@o2ib failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<< lustre client pinging to server over IPoIB works: [root@n0~]# ping -c 1 192.168.13.8 PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data. 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms lustre client pinging to self or other client works: [root@n0 ~]# lctl ping 192.168.13.54@o2ib 12345-0@lo 12345-192.168.13.54@o2ib lustre client pinging to self or otover IPoIB works: [root@n0~]# ping -c 1 192.168.13.54 PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data. 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms The lustre server and client have specified the modprobe for lnet: /etc/modprobe.conf options lnet networks=o2ib(ib0) The client reports some error when trying to ping or mount from the client to server: modprobe lustre lnet lctl ping 192.168.13.8@o2ib mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs [root@n0 ~]# dmesg|tail [589805.093447] Lustre: Lustre: Build Version: 2.11.54 [589805.272652] LNet: Using FastReg for registration [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180] [589813.278370] LNet: 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 192.168.13.186@o2ib: 589813 seconds [589835.518404] LustreError: 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2ib: failed processing log, type 1: rc = -5 [589843.118385] LustreError: 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5 [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The configuration from log 'zfs-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [589866.741623] Lustre: Unmounted zfs-client [589867.278516] LustreError: 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (-5) server reports some error during mounting: [root@license ~]# Sep 4 07:26:56 license kernel: LNet: 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept conn from 192.168.13.54@o2ib (version 12): max_frags 16 incompatible without FMR pool (256 wanted) The lustre server setup: [root@license ~]# lfs df -h UUID bytes Used Available Use% Mounted on zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1% /mnt/zfs[MDT:0] zfs-OST0000_UUID 1.7T 10.0G 1.7T 1% /mnt/zfs[OST:0] filesystem_summary: 1.7T 10.0G 1.7T 1% /mnt/zfs server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-2.0.7.0, lustre 2.11.54 client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-2.0.7.0 , lustre 2.11.54 Regards, - Pak -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/08a17f0d/attachment-0001.html> ------------------------------ Message: 2 Date: Tue, 4 Sep 2018 16:00:19 +0000 From: Richard Henwood <richard.henw...@arm.com> To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>, "pak....@linaro.org" <pak....@linaro.org> Subject: Re: [lustre-discuss] lustre client not able to lctl ping or mount Message-ID: <5f920989941b1007874e988bf748eb1a84a38068.ca...@arm.com> Content-Type: text/plain; charset="utf-8" On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote: > Hi all, > > I am having issue with the Lustre client pinging the server using > o2ib.I want to find out if anyone has a suggestion on what could be > the problem. Thanks in advance. > > lustre client pinging to server: > > [root@n0 ~]# lctl ping 192.168.13.8@o2ib > > failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<< > > lustre client pinging to server over IPoIB works: > > [root@n0~]# ping -c 1 192.168.13.8 > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data. > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms > > lustre client pinging to self or other client works: > > [root@n0 ~]# lctl ping 192.168.13.54@o2ib > > 12345-0@lo > > 12345-192.168.13.54@o2ib > > lustre client pinging to self or otover IPoIB works: > > [root@n0~]# ping -c 1 192.168.13.54 > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data. > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms > > The lustre server and client have specified the modprobe for lnet: > > /etc/modprobe.conf > > options lnet networks=o2ib(ib0) > > The client reports some error when trying to ping or mount from the > client to server: > modprobe lustre lnet > lctl ping 192.168.13.8@o2ib > mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs > > > [root@n0 ~]# dmesg|tail > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54 > > [589805.272652] LNet: Using FastReg for registration > > [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180] > > [589813.278370] LNet: > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1 > > 92.168.13.186@o2ib: 589813 seconds > > [589835.518404] LustreError: > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2i > > b: failed processing log, type 1: rc = -5 > > [589843.118385] LustreError: > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5 > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The > > configuration from log 'zfs-client' failed (-5). This may be the > > result of communication errors between this node and the MGS, a bad > > configuration, or other errors. See the syslog for more > > information. > > [589866.741623] Lustre: Unmounted zfs-client > > [589867.278516] LustreError: > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (- > > 5) > > server reports some error during mounting: > > [root@license ~]# Sep 4 07:26:56 license kernel: LNet: > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept > > conn from 192.168.13.54@o2ib (version 12): max_frags 16 > > incompatible without FMR pool (256 wanted) > > The lustre server setup: > > [root@license ~]# lfs df -h > > UUID bytes Used Available Use% > > Mounted on > > zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1% > > /mnt/zfs[MDT:0] > > zfs-OST0000_UUID 1.7T 10.0G 1.7T 1% > > /mnt/zfs[OST:0] > > > > filesystem_summary: 1.7T 10.0G 1.7T 1% > > /mnt/zfs > > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4- > 2.0.7.0, lustre 2.11.54 > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4- > 2.0.7.0 , lustre 2.11.54 > It might be helpful to state the Lustre software versions that you have used. Also, given this is an Arm client with (with presumably 64K pg size), connecting to a x86 server (with presumably 4K pg size), have you added the map_on_demand=16 incantation to the server? I don't have direct experience of this, but heard it was needed in some Arm configurations (depending on server/client version): https://jira.whamcloud.com/browse/LU-10775 May be James can advise? best regards, Richard -- richard.henw...@arm.com Server Software Eco-System Tel: +1 512 410 9612 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ------------------------------ Message: 3 Date: Tue, 4 Sep 2018 09:12:03 -0700 From: Pak Lui <pak....@linaro.org> To: Richard Henwood <richard.henw...@arm.com> Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] lustre client not able to lctl ping or mount Message-ID: <camsct+wpamcuthczipocxkqouksowprl8n928lwrr9f45xm...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Richard, James, I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that was suggested. Also tried "map_on_demand=0" as suggested here: http://wiki.lustre.org/Optimizing_o2iblnd_Performance /etc/modprobe.d/ko2iblnd.conf alias ko2iblnd-opa ko2iblnd # tried, as suggested in http://wiki.lustre.org/Optimizing_o2iblnd_Performance #options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe As for the Lustre software versions that I am using: > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4- > 2.0.7.0, lustre 2.11.54 > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4- > 2.0.7.0 , lustre 2.11.54 As for the IB hardware, it is Mellanox ConnectX-5 Socket Direct. Only 1 IPoIB for mlx5_0 (for the ib0 interface) is configured. Thanks, - Pak On Tue, Sep 4, 2018 at 9:00 AM, Richard Henwood <richard.henw...@arm.com> wrote: > On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote: > > Hi all, > > > > I am having issue with the Lustre client pinging the server using > > o2ib.I want to find out if anyone has a suggestion on what could be > > the problem. Thanks in advance. > > > > lustre client pinging to server: > > > [root@n0 ~]# lctl ping 192.168.13.8@o2ib > > > failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<< > > > > lustre client pinging to server over IPoIB works: > > > [root@n0~]# ping -c 1 192.168.13.8 > > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data. > > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms > > > > lustre client pinging to self or other client works: > > > [root@n0 ~]# lctl ping 192.168.13.54@o2ib > > > 12345-0@lo > > > 12345-192.168.13.54@o2ib > > > > lustre client pinging to self or otover IPoIB works: > > > [root@n0~]# ping -c 1 192.168.13.54 > > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data. > > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms > > > > The lustre server and client have specified the modprobe for lnet: > > > /etc/modprobe.conf > > > options lnet networks=o2ib(ib0) > > > > The client reports some error when trying to ping or mount from the > > client to server: > > modprobe lustre lnet > > lctl ping 192.168.13.8@o2ib > > mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs > > > > > [root@n0 ~]# dmesg|tail > > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54 > > > [589805.272652] LNet: Using FastReg for registration > > > [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180] > > > [589813.278370] LNet: > > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1 > > > 92.168.13.186@o2ib: 589813 seconds > > > [589835.518404] LustreError: > > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2i > > > b: failed processing log, type 1: rc = -5 > > > [589843.118385] LustreError: > > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5 > > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The > > > configuration from log 'zfs-client' failed (-5). This may be the > > > result of communication errors between this node and the MGS, a bad > > > configuration, or other errors. See the syslog for more > > > information. > > > [589866.741623] Lustre: Unmounted zfs-client > > > [589867.278516] LustreError: > > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (- > > > 5) > > > > server reports some error during mounting: > > > [root@license ~]# Sep 4 07:26:56 license kernel: LNet: > > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept > > > conn from 192.168.13.54@o2ib (version 12): max_frags 16 > > > incompatible without FMR pool (256 wanted) > > > > The lustre server setup: > > > [root@license ~]# lfs df -h > > > UUID bytes Used Available Use% > > > Mounted on > > > zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1% > > > /mnt/zfs[MDT:0] > > > zfs-OST0000_UUID 1.7T 10.0G 1.7T 1% > > > /mnt/zfs[OST:0] > > > > > > filesystem_summary: 1.7T 10.0G 1.7T 1% > > > /mnt/zfs > > > > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4- > > 2.0.7.0, lustre 2.11.54 > > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4- > > 2.0.7.0 , lustre 2.11.54 > > > > > It might be helpful to state the Lustre software versions that you have > used. > > Also, given this is an Arm client with (with presumably 64K pg size), > connecting to a x86 server (with presumably 4K pg size), have you added > the map_on_demand=16 incantation to the server? I don't have direct > experience of this, but heard it was needed in some Arm configurations > (depending on server/client version): > > https://jira.whamcloud.com/browse/LU-10775 > > May be James can advise? > > best regards, > Richard > > -- > richard.henw...@arm.com > Server Software Eco-System > Tel: +1 512 410 9612 > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > -- Regards, - Pak -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180904/1a52688d/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ------------------------------ End of lustre-discuss Digest, Vol 150, Issue 3 ********************************************** _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org