Thomas Geibhardt wrote:
: cpu family      : 15
: model           : 33

This is the same model as our 875s. It should support 8 ranks of DDR400 (and 
not just DDR333). Your BIOS seems to be a little overcautious, in fact.

: It is very difficult to test that since we cannot trigger the 
: crashes reliably. The cluster is now running stable for more 
: than a week. If I'd slow down the the memory bus speed it would
: take months to get a statistically significant conclusion. On 
: the other side an average rate of 2 crashes per week is rather
: annoying.

We found that enabling L2 and memory scrub greatly increases the probability of 
slightly dodgy DIMMs failing. We use 655 and 81.9 microseconds respectively for 
L2 and main memory scrub interval. We also enable memory scrub redirect (which 
seems to be necessary to avoid POST failures on cold boot) and chipkill.

Serguei

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to