thanks robin, sorry for my later reply.

[root@bigdata-dlp-server00 ~]# salt "ml-storage-ser2[0-9].nmg01" cmd.run
"lctl list_nids"
ml-storage-ser28.nmg01: (node28)
    10.82.143.202@o2ib1
    10.83.162.19@tcp1
ml-storage-ser25.nmg01: (node25)
    10.83.162.16@tcp1
    10.82.143.199@o2ib1
ml-storage-ser20.nmg01: (node20)
    10.82.143.194@o2ib1
    10.83.162.11@tcp1
ml-storage-ser24.nmg01:(node24)
    10.82.143.198@o2ib1
    10.83.162.15@tcp1
ml-storage-ser29.nmg01:(node29)
    10.83.162.20@tcp1
    10.82.143.203@o2ib1
ml-storage-ser22.nmg01: (node22)
    10.82.143.196@o2ib1
    10.83.162.13@tcp1
ml-storage-ser27.nmg01: (node27)
    10.83.162.18@tcp1
    10.82.143.201@o2ib1
ml-storage-ser23.nmg01: (node23)
    10.83.162.14@tcp1
    10.82.143.197@o2ib1
ml-storage-ser26.nmg01: (node26)
    10.82.143.200@o2ib1
    10.83.162.17@tcp1
ml-storage-ser21.nmg01: (node21)
    10.83.162.12@tcp1
    10.82.143.195@o2ib1

root@ml-gpu-ser200.nmg01:~$ lctl list_nids
10.82.141.208@o2ib1
10.83.152.55@tcp1
root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
failed to ping 10.82.143.202@o2ib1: Input/output error
root@ml-gpu-ser200.nmg01:~$

I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost
and client:
root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf
options lnet networks="o2ib1(eth3.2)"
and I exec command line : lnetctl lnet configure --all to make my static
lnet configuration take effect. but i still can't ping node28 from my
client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on
 client ml-gpu-ser200.nmg01.

And I can lctl ping node28@o2ib successfully from other mdt or ost nodes,
such as:
root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node28@o2ib1
12345-0@lo
12345-10.82.143.202@o2ib1
12345-10.83.162.19@tcp1
root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node20@o2ib1
12345-0@lo
12345-10.82.143.194@o2ib1
12345-10.83.162.11@tcp1
root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node21@o2ib1
12345-0@lo
12345-10.83.162.12@tcp1
12345-10.82.143.195@o2ib1
root@ml-storage-ser26.nmg01:/home/odin/sunyuyusun$ lctl ping node22@o2ib1
12345-0@lo
12345-10.82.143.196@o2ib1
12345-10.83.162.13@tcp1

so what lnet configuration should I set to solve this problem?

Thanks very much .
Yours
Yu

Robin Humble <rjh+lus...@cita.utoronto.ca> 于2018年6月26日周二 下午10:48写道:

> On Tue, Jun 26, 2018 at 04:05:14PM +0800, yu sun wrote:
> >hi all:
> >     I want to build a lustre storage system, and I found not all of the
> >machine in the same sub-network, and they cant lctl ping with each other.
> >the details list as below:
> >
> >root@ml-storage-ser30.nmg01:~$ lctl list_nids
> >10.82.145.2@o2ib
> >root@ml-storage-ser30.nmg01:~$ lctl ping node28@o2ib
> >failed to ping 10.82.143.202@o2ib: Input/output error
> >root@ml-storage-ser30.nmg01:~$
>
> what does 'lctl list_nids' say on node28?
> also disable iptables everywhere.
>
> cheers,
> robin
>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to