Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

Aaron S. Knister Tue, 04 Mar 2008 12:55:18 -0800

I think I tried that before and it didn't help, but I will try it again. Thanks 
for the suggestion.


-Aaron 

----- Original Message ----- 
From: "Charles Taylor" <[EMAIL PROTECTED]> 
To: "Aaron S. Knister" <[EMAIL PROTECTED]> 
Cc: "lustre-discuss" <[EMAIL PROTECTED]>, "Thomas Wakefield" <[EMAIL 
PROTECTED]> 
Sent: Tuesday, March 4, 2008 3:41:04 PM GMT -05:00 US/Canada Eastern 
Subject: Re: [Lustre-discuss] Cannot send after transport endpoint shutdown 
(-108) 

We've seen this before as well. Our experience is that the 
obd_timeout is far too small for large clusters (ours is 400+ 
nodes) and the only way we avoid these errors is by setting it to 
1000 which seems high to us but appears to work and puts an end to 
the transport endpoint shutdowns. 

On the MDS.... 

lctl conf_param srn.sys.timeout=1000 

You may have to do this on the OSS's as well unless you restart the 
OSS's but I could be wrong on that. You should check it everywhere 
with... 

cat /proc/sys/lustre/timeout 


On Mar 4, 2008, at 3:31 PM, Aaron S. Knister wrote: 

> This morning I've had both my infiniband and tcp lustre clients 
> hiccup. They are evicted from the server presumably as a result of 
> their high load and consequent timeouts. My question is- why don't 
> the clients re-connect. The infiniband and tcp clients both give 
> the following message when I type "df" - Cannot send after 
> transport endpoint shutdown (-108). I've been battling with this on 
> and off now for a few months. I've upgraded my infiniband switch 
> firmware, all the clients and servers are running the latest 
> version of lustre and the lustre patched kernel. Any ideas? 
> 
> -Aaron 
> _______________________________________________ 
> Lustre-discuss mailing list 
> [email protected] 
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

Reply via email to