On Thu, Nov 22, 2007 at 01:53:04PM +0100, Jürgen Kabelitz wrote: > > Hi, > > We had the same problems with a cluster of 40 nodes. The motherboard has > problems with great IO. We have some test programs they used only the cpu and > make no or less IO. These programmes runs and runs. But when you have a > program like Gaussian with a big IO then this can happen. > At last we change the motherboard against the S2882. > J. Kabelitz > > > -----Ursprüngliche Nachricht----- > Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von stephen > mulcahy > Gesendet: Mittwoch, 21. November 2007 18:28 > An: [email protected] > Betreff: [Beowulf] Tips for diagnosing intermittent problems on a small > cluster > > Hi, > > As I mentioned in my previous posting, the 20 node Tyan S2891 Dual > Opteron dual core Debian cluster (1 NFS providing head node, 19 diskless > compute nodes) is currently experiencing 2 intermittent problems which > I'm trying to diagnose. > > After a few days of testing and digging through system logs I'm pretty > much stumped as to what may be causing these. There are 2 separate > problems - anyones opinions on how to go about diagnosing these problems > or things I might have missed would be most welcome. > > Problem #1 > Over the last 6 months, 3 different nodes have been found in a powered > down state - the nodes seem to have powered off during a run of the > model.
Same here with on a single machine with an earlier model Tyan board - it happened to us either after a very occasional kernel panic/exception or after 25-28 days of continuous running. I've got a 2885 here, if I can just find two Opterons, memory and a case :-) I'll let you know if this one does it too. There _may_ be some PSU involvement with ours: the machine and fans are running but not accepting connections. You have to disconnect the power for a few minutes for it to even boot again properly. Powercycling from the front panel doesn't always work Debian etch, stock Debian kernel (2.6.18-5 from memory). Andy _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
