thanks robin, sorry for my later reply. [root@bigdata-dlp-server00 ~]# salt "ml-storage-ser2[0-9].nmg01" cmd.run "lctl list_nids" ml-storage-ser28.nmg01: (node28) 10.82.143.202@o2ib1 10.83.162.19@tcp1 ml-storage-ser25.nmg01: (node25) 10.83.162.16@tcp1 10.82.143.199@o2ib1 ml-storage-ser20.nmg01: (node20) 10.82.143.194@o2ib1 10.83.162.11@tcp1 ml-storage-ser24.nmg01:(node24) 10.82.143.198@o2ib1 10.83.162.15@tcp1 ml-storage-ser29.nmg01:(node29) 10.83.162.20@tcp1 10.82.143.203@o2ib1 ml-storage-ser22.nmg01: (node22) 10.82.143.196@o2ib1 10.83.162.13@tcp1 ml-storage-ser27.nmg01: (node27) 10.83.162.18@tcp1 10.82.143.201@o2ib1 ml-storage-ser23.nmg01: (node23) 10.83.162.14@tcp1 10.82.143.197@o2ib1 ml-storage-ser26.nmg01: (node26) 10.82.143.200@o2ib1 10.83.162.17@tcp1 ml-storage-ser21.nmg01: (node21) 10.83.162.12@tcp1 10.82.143.195@o2ib1
root@ml-gpu-ser200.nmg01:~$ lctl list_nids 10.82.141.208@o2ib1 10.83.152.55@tcp1 root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1 failed to ping 10.82.143.202@o2ib1: Input/output error root@ml-gpu-ser200.nmg01:~$ I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost and client: root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf options lnet networks="o2ib1(eth3.2)" and I exec command line : lnetctl lnet configure --all to make my static lnet configuration take effect. but i still can't ping node28 from my client ml-gpu-ser200.nmg01. I can mount as well as access lustre on client ml-gpu-ser200.nmg01. And I can lctl ping node28@o2ib successfully from other mdt or ost nodes, such as: root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node28@o2ib1 12345-0@lo 12345-10.82.143.202@o2ib1 12345-10.83.162.19@tcp1 root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node20@o2ib1 12345-0@lo 12345-10.82.143.194@o2ib1 12345-10.83.162.11@tcp1 root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node21@o2ib1 12345-0@lo 12345-10.83.162.12@tcp1 12345-10.82.143.195@o2ib1 root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node22@o2ib1 12345-0@lo 12345-10.82.143.196@o2ib1 12345-10.83.162.13@tcp1 so what lnet configuration should I set to solve this problem? Thanks very much . Yours Yu Robin Humble <rjh+lus...@cita.utoronto.ca> 于2018年6月26日周二 下午10:48写道: > On Tue, Jun 26, 2018 at 04:05:14PM +0800, yu sun wrote: > >hi all: > > I want to build a lustre storage system, and I found not all of the > >machine in the same sub-network, and they cant lctl ping with each other. > >the details list as below: > > > >root@ml-storage-ser30.nmg01:~$ lctl list_nids > >10.82.145.2@o2ib > >root@ml-storage-ser30.nmg01:~$ lctl ping node28@o2ib > >failed to ping 10.82.143.202@o2ib: Input/output error > >root@ml-storage-ser30.nmg01:~$ > > what does 'lctl list_nids' say on node28? > also disable iptables everywhere. > > cheers, > robin >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org