On Tue, 2009-09-01 at 11:34 -0700, Don Thorp wrote:
> 
> New hardware that will support the workload is on the way, but are  
> there some changes I can make now to 1.6.6 that would increase  
> reliability, even at the expense of performance?

With what you have given us to work with, my first suggestion would be
to increase your obd_timeout.  You should not need to go higher than
about 300 seconds, but should try to choose a value only high enough to
stop the callback timeouts.  Higher obd_timeout values mean longer
recoveries.

Additionally, you might look into tuning the number of OST threads on
your OSSes if you are driving your disks too hard.  OST thread count,
like obd_timeout should be just high enough, but not more, to reach
maximum throughput.  If you have not baselined your hardware with the
iokit, you can simply start dropping the OST thread counts until you
find that you are impacting throughput.  It's a bit more trial and error
than using the iokit, but if you are in production already, it's
probably the best you can do.

b.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to