>Have you checked in the baseboard management log to see if it is >throwing an error.
Apparently the SC1435 does not have OpenManage. "Simple Computing" is too simple to warrant that, I was told. They do have dset to look at the ESM logs but not for CentOS nor Fedora. Redhat is their "validated" [sic] OS. That's the only one they support. So I'm sort of stuck there. > Also check on the temperature of the machines. We >have had some pretty wierd issues with ram and CPU quirkyness when >they reach a high internal temperature. If you can do some poling >using ipmi on the nodes to record the current temp and fan data over >time so that you could see what it was at just before a crash you >might be able to point it to an environmental situation. I'll try ipmi. I was trying lm_sensors but apparantly it does not have a driver for this chipset / motherboard combination. Not sure if its an AMD Opteron specific driver issue or a vendor-not-relesing-motherboard-specs issue (heard both versions on the net). Anybody else had success using lm_sensors on the SC1435? -- Rahul _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
