Richard, James, I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that was suggested. Also tried "map_on_demand=0" as suggested here: http://wiki.lustre.org/Optimizing_o2iblnd_Performance
/etc/modprobe.d/ko2iblnd.conf alias ko2iblnd-opa ko2iblnd # tried, as suggested in http://wiki.lustre.org/Optimizing_o2iblnd_Performance #options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 ntx=2048 map_on_demand=16 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe As for the Lustre software versions that I am using: > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4- > 2.0.7.0, lustre 2.11.54 > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4- > 2.0.7.0 , lustre 2.11.54 As for the IB hardware, it is Mellanox ConnectX-5 Socket Direct. Only 1 IPoIB for mlx5_0 (for the ib0 interface) is configured. Thanks, - Pak On Tue, Sep 4, 2018 at 9:00 AM, Richard Henwood <[email protected]> wrote: > On Tue, 2018-09-04 at 08:06 -0700, Pak Lui wrote: > > Hi all, > > > > I am having issue with the Lustre client pinging the server using > > o2ib.I want to find out if anyone has a suggestion on what could be > > the problem. Thanks in advance. > > > > lustre client pinging to server: > > > [root@n0 ~]# lctl ping 192.168.13.8@o2ib > > > failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<< > > > > lustre client pinging to server over IPoIB works: > > > [root@n0~]# ping -c 1 192.168.13.8 > > > PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data. > > > 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms > > > > lustre client pinging to self or other client works: > > > [root@n0 ~]# lctl ping 192.168.13.54@o2ib > > > 12345-0@lo > > > 12345-192.168.13.54@o2ib > > > > lustre client pinging to self or otover IPoIB works: > > > [root@n0~]# ping -c 1 192.168.13.54 > > > PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data. > > > 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms > > > > The lustre server and client have specified the modprobe for lnet: > > > /etc/modprobe.conf > > > options lnet networks=o2ib(ib0) > > > > The client reports some error when trying to ping or mount from the > > client to server: > > modprobe lustre lnet > > lctl ping 192.168.13.8@o2ib > > mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs > > > > > [root@n0 ~]# dmesg|tail > > > [589805.093447] Lustre: Lustre: Build Version: 2.11.54 > > > [589805.272652] LNet: Using FastReg for registration > > > [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180] > > > [589813.278370] LNet: > > > 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 1 > > > 92.168.13.186@o2ib: 589813 seconds > > > [589835.518404] LustreError: > > > 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2i > > > b: failed processing log, type 1: rc = -5 > > > [589843.118385] LustreError: > > > 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5 > > > [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The > > > configuration from log 'zfs-client' failed (-5). This may be the > > > result of communication errors between this node and the MGS, a bad > > > configuration, or other errors. See the syslog for more > > > information. > > > [589866.741623] Lustre: Unmounted zfs-client > > > [589867.278516] LustreError: > > > 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (- > > > 5) > > > > server reports some error during mounting: > > > [root@license ~]# Sep 4 07:26:56 license kernel: LNet: > > > 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept > > > conn from 192.168.13.54@o2ib (version 12): max_frags 16 > > > incompatible without FMR pool (256 wanted) > > > > The lustre server setup: > > > [root@license ~]# lfs df -h > > > UUID bytes Used Available Use% > > > Mounted on > > > zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1% > > > /mnt/zfs[MDT:0] > > > zfs-OST0000_UUID 1.7T 10.0G 1.7T 1% > > > /mnt/zfs[OST:0] > > > > > > filesystem_summary: 1.7T 10.0G 1.7T 1% > > > /mnt/zfs > > > > server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4- > > 2.0.7.0, lustre 2.11.54 > > client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4- > > 2.0.7.0 , lustre 2.11.54 > > > > > It might be helpful to state the Lustre software versions that you have > used. > > Also, given this is an Arm client with (with presumably 64K pg size), > connecting to a x86 server (with presumably 4K pg size), have you added > the map_on_demand=16 incantation to the server? I don't have direct > experience of this, but heard it was needed in some Arm configurations > (depending on server/client version): > > https://jira.whamcloud.com/browse/LU-10775 > > May be James can advise? > > best regards, > Richard > > -- > [email protected] > Server Software Eco-System > Tel: +1 512 410 9612 > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > -- Regards, - Pak
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
