Hi~
I want to establish a dual node Lustre server environment. Use RDMA among them to improve the performance of server response. After installing Lustre and corresponding drivers that support RDMA, there was an issue during the deployment of the Lustre file system. When mounting MDS on the second node, the following error occurred: [root@172-0-37-83 ~]# mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa /mnt/tfs/mgs2 mount.lustre: mount /dev/mapper/mpathcj at /mnt/tfs/mgs2 failed: Connection timed out Log information: Jun 6 04:44:25 localhost kernel: LNetError: 23212:0:(o2iblnd.c:819:kiblnd_create_conn()) cmid HCA(mlx5_0), kib_dev(ens14f0np0) need failover Jun 6 04:44:31 localhost kernel: LNetError: 23213:0:(o2iblnd.c:819:kiblnd_create_conn()) cmid HCA(mlx5_0), kib_dev(ens14f0np0) need failover I found a similar issue in the community, but it still failed after trying to reload the module。 [LU-7124] MLX5: Limit hit in cap.max_send_wr - Whamcloud Community JIRA May I ask what is causing this and what changes are needed to solve the problem? ——Shuobin The following is my configuration and formatting process: node1 node2 mkfs.lustre --fsname=ltfs1 --mgs --mdt --index=0 --servicenode=192.168.19.14@o2ib1 --servicenode=192.168.19.15@o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde4066c0000a8 mkfs.lustre --fsname=ltfs1 --mdt --index=1 --mgsnode=192.168.19.14@o2ib1 --mgsnode=192.168.19.15@o2ib1 --failnode=192.168.19.15@o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde5093e0000a9 mkfs.lustre --fsname=ltfs1 --mdt --index=2 --mgsnode=192.168.19.15@o2ib1 --mgsnode=192.168.19.14@o2ib1 --failnode=192.168.19.14@o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa mkfs.lustre --fsname=ltfs1 --mdt --index=3 --mgsnode=192.168.19.15@o2ib1 --mgsnode=192.168.19.14@o2ib1 --failnode=192.168.19.14@o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde7367f0000ab node1 mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde4066c0000a8 /mnt/tfs/mgs mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde5093e0000a9 /mnt/tfs/mgs1 node2 mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa /mnt/tfs/mgs2 mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde7367f0000ab /mnt/tfs/mgs3
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org