Re: [lustre-discuss] strange errors on Lustre servers

2018-08-12 Thread Zeeshan Ali Shah
yup .. thanks a lot


Zee

On Sun, Aug 12, 2018 at 11:16 AM Lixin Liu  wrote:

> Hi Zeeshan,
>
>
> Thanks for the hint.
>
>
>
> OPA works fine, but then I found someone brought up a misconfigured node
> which has
>
> the conflict IP address on OPA interface. Fixing it and problem solved.
>
>
>
> Lixin.
>
>
>
>
>
> *From: *Zeeshan Ali Shah 
> *Date: *Saturday, August 11, 2018 at 11:20 PM
> *To: *Lixin Liu 
> *Cc: *"lustre-discuss@lists.lustre.org" 
> *Subject: *Re: [lustre-discuss] strange errors on Lustre servers
>
>
>
> What is output of opainfo ?
>
> Sent from my iPhone
>
>
> On 12 Aug 2018, at 04:04, Lixin Liu  wrote:
>
> Hi,
>
>
>
> I am getting these errors on all our MDS and OSS servers (Lustre 2.10.1):
>
>
>
> Aug 11 11:45:52 ndc-oss5b kernel: LNet:
> 24727:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale
> 172.19.142.119@o2ib version 12/12 incarnation
> 1533927051163335/1533998625080752
>
> Aug 11 11:55:52 ndc-oss5b kernel: LNet:
> 105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale
> 172.19.142.119@o2ib version 12/12 incarnation
> 1533927051163335/1533998625080752
>
> Aug 11 12:05:52 ndc-oss5b kernel: LNet:
> 105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale
> 172.19.142.119@o2ib version 12/12 incarnation
> 1533927051163335/1533998625080752
>
> Aug 11 12:15:52 ndc-oss5b kernel: LNet:
> 105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale
> 172.19.142.119@o2ib version 12/12 incarnation
> 1533927051163335/1533998625080752
>
> Aug 11 12:25:52 ndc-oss5b kernel: LNet:
> 105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale
> 172.19.142.119@o2ib version 12/12 incarnation
> 1533927051163335/1533998625080752
>
>
>
> This is a new node we brought online recently. Is it an indication that we
> have problem with
>
> it OPA interface on the node? This machine has a 8160F CPU (OPA interface
> on chip).
>
>
>
> Thanks,
>
>
>
> Lixin Liu
>
> High Performance Computing
>
> Simon Fraser University
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] strange errors on Lustre servers

2018-08-12 Thread Lixin Liu
Hi Zeeshan,

Thanks for the hint.

OPA works fine, but then I found someone brought up a misconfigured node which 
has
the conflict IP address on OPA interface. Fixing it and problem solved.

Lixin.


From: Zeeshan Ali Shah 
Date: Saturday, August 11, 2018 at 11:20 PM
To: Lixin Liu 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] strange errors on Lustre servers

What is output of opainfo ?
Sent from my iPhone

On 12 Aug 2018, at 04:04, Lixin Liu mailto:l...@sfu.ca>> wrote:
Hi,

I am getting these errors on all our MDS and OSS servers (Lustre 2.10.1):

Aug 11 11:45:52 ndc-oss5b kernel: LNet: 
24727:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale 
172.19.142.119@o2ib version 12/12 incarnation 1533927051163335/1533998625080752
Aug 11 11:55:52 ndc-oss5b kernel: LNet: 
105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale 
172.19.142.119@o2ib version 12/12 incarnation 1533927051163335/1533998625080752
Aug 11 12:05:52 ndc-oss5b kernel: LNet: 
105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale 
172.19.142.119@o2ib version 12/12 incarnation 1533927051163335/1533998625080752
Aug 11 12:15:52 ndc-oss5b kernel: LNet: 
105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale 
172.19.142.119@o2ib version 12/12 incarnation 1533927051163335/1533998625080752
Aug 11 12:25:52 ndc-oss5b kernel: LNet: 
105990:0:(o2iblnd_cb.c:2410:kiblnd_passive_connect()) Conn stale 
172.19.142.119@o2ib version 12/12 incarnation 1533927051163335/1533998625080752

This is a new node we brought online recently. Is it an indication that we have 
problem with
it OPA interface on the node? This machine has a 8160F CPU (OPA interface on 
chip).

Thanks,

Lixin Liu
High Performance Computing
Simon Fraser University

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org