Hi Arne, could be memory pressure and the OOM running and shooting at things. How much memory does you server has?
Michael Am Freitag, den 09.10.2009, 10:26 +0200 schrieb Arne Brutschy: > Hi everyone, > > 2 months ago, we switched our ~80 node cluster from NFS to lustre. 1 > MDS, 4 OSTs, lustre 1.6.7.2 on a rocks 4.2.1/centos 4.2/linux > 2.6.9-78.0.22. > > We were quite happy with lustre's performance, especially because > bottlenecks caused by /home disk access were history. > > Saturday, the cluster went down (= was inaccessible). After some > investigation I found out that the reason seems to be an overloaded MDS. > Over the following 4 days, this happened multiple times and could only > be resolved by 1) killing all user jobs and 2) hard-resetting the MDS. > > The MDS did not respond to any command, if I managed to get a video > signal (not often), load was >170. Additionally, 2 times kernel oops got > displayed, but unfortunately I have to record of them. > > The clients showed the following error: > > Oct 8 09:58:55 majorana kernel: LustreError: > > 3787:0:(events.c:66:request_out_callback()) @@@ type 4, status -5 > > r...@f6222800 x8702488/t0 o250->m...@10.255.255.206@tcp:26/25 lens 304/456 > > e 0 to 1 dl 1254988740 ref 2 fl Rpc:N/0/0 rc 0/0 > > Oct 8 09:58:55 majorana kernel: LustreError: > > 3787:0:(events.c:66:request_out_callback()) Skipped 33 previous similar > > messages > > So, my question is: what could cause such a load? The cluster was not > exessively used... Is this a bug or a user's job that creates the load? > How can I protect lustre against this kind of failure? > > Thanks in advance, > Arne > -- Michael Kluge, M.Sc. Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kl...@tu-dresden.de WWW: http://www.tu-dresden.de/zih
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss