On Thu, Nov 22, 2007 at 01:53:04PM +0100, Jürgen Kabelitz wrote:
> 
> Hi,
> 
> We had the same problems with a cluster of 40 nodes. The motherboard has 
> problems with great IO. We have some test programs they used only the cpu and 
> make no or less IO. These programmes runs and runs. But when you have a 
> program like Gaussian with a big IO then this can happen.
> At last we change the motherboard against the S2882.
> J. Kabelitz
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von stephen 
> mulcahy
> Gesendet: Mittwoch, 21. November 2007 18:28
> An: [email protected]
> Betreff: [Beowulf] Tips for diagnosing intermittent problems on a small 
> cluster
> 
> Hi,
> 
> As I mentioned in my previous posting, the 20 node Tyan S2891 Dual
> Opteron dual core Debian cluster (1 NFS providing head node, 19 diskless
> compute nodes) is currently experiencing 2 intermittent problems which
> I'm trying to diagnose.
> 
> After a few days of testing and digging through system logs I'm pretty
> much stumped as to what may be causing these. There are 2 separate
> problems - anyones opinions on how to go about diagnosing these problems
> or things I might have missed would be most welcome.
> 
> Problem #1
> Over the last 6 months, 3 different nodes have been found in a powered
> down state - the nodes seem to have powered off during a run of the
> model. 

Same here with on a single machine with an earlier model Tyan board - it 
happened to us either after a very occasional kernel panic/exception or 
after 25-28 days of continuous running. I've got a 2885 here, if I can 
just find two Opterons, memory and a case :-) I'll let you know if this 
one does it too. 

There _may_ be some PSU involvement with ours: the machine and fans are 
running but not accepting connections. You have to disconnect the power
for a few minutes for it to even boot again properly. Powercycling from 
the front panel doesn't always work

Debian etch, stock Debian kernel (2.6.18-5 from memory).

Andy


_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to