Hi~


I want to establish a dual node Lustre server environment. Use RDMA among them 
to improve the performance of server response.


After installing Lustre and corresponding drivers that support RDMA, there was 
an issue during the deployment of the Lustre file system.


When mounting MDS on the second node, the following error occurred:


[root@172-0-37-83 ~]# mount -t lustre 
/dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa /mnt/tfs/mgs2
mount.lustre: mount /dev/mapper/mpathcj at /mnt/tfs/mgs2 failed: Connection 
timed out
Log information:
Jun  6 04:44:25 localhost kernel: LNetError: 
23212:0:(o2iblnd.c:819:kiblnd_create_conn()) cmid HCA(mlx5_0), 
kib_dev(ens14f0np0) need failover
Jun  6 04:44:31 localhost kernel: LNetError: 
23213:0:(o2iblnd.c:819:kiblnd_create_conn()) cmid HCA(mlx5_0), 
kib_dev(ens14f0np0) need failover


I found a similar issue in the community, but it still failed after trying to 
reload the module。
[LU-7124] MLX5: Limit hit in cap.max_send_wr - Whamcloud Community JIRA


May I ask what is causing this and what changes are needed to solve the problem?


——Shuobin


The following is my configuration and formatting process:
 


node1
 


node2
 









mkfs.lustre --fsname=ltfs1 --mgs --mdt --index=0 
--servicenode=192.168.19.14@o2ib1 --servicenode=192.168.19.15@o2ib1  --reformat 
--mkfsoptions "-E stride=32" 
/dev/disk/by-id/scsi-3600b3420371420b645dde4066c0000a8 

mkfs.lustre --fsname=ltfs1  --mdt --index=1 --mgsnode=192.168.19.14@o2ib1 
--mgsnode=192.168.19.15@o2ib1 --failnode=192.168.19.15@o2ib1  --reformat  
--mkfsoptions "-E stride=32" 
/dev/disk/by-id/scsi-3600b3420371420b645dde5093e0000a9

mkfs.lustre --fsname=ltfs1  --mdt --index=2 --mgsnode=192.168.19.15@o2ib1 
--mgsnode=192.168.19.14@o2ib1 --failnode=192.168.19.14@o2ib1  --reformat  
--mkfsoptions "-E stride=32" 
/dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa

mkfs.lustre --fsname=ltfs1  --mdt --index=3 --mgsnode=192.168.19.15@o2ib1 
--mgsnode=192.168.19.14@o2ib1 --failnode=192.168.19.14@o2ib1  --reformat  
--mkfsoptions "-E stride=32" 
/dev/disk/by-id/scsi-3600b3420371420b645dde7367f0000ab

node1

mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde4066c0000a8 
/mnt/tfs/mgs

mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde5093e0000a9 
/mnt/tfs/mgs1

node2

mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa 
/mnt/tfs/mgs2

mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde7367f0000ab 
/mnt/tfs/mgs3


















_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to