Re: FreeBSD Crash without Errors, Warnings, or Panics

Paul Saab Thu, 13 Apr 2006 17:05:32 -0700

There are serious race conditions with amr in 6.0 that can cause serioushangs. I suggest you take the amr driver from RELENG_6 and try that.


Matthew Hagerty wrote:

Greetings,
I'm running 6.0-RELEASE-p5 on a Toshiba built server: dual Xeon Intelmotherboard with a LSILogic MegaRAID (amr0) controller. This machinehas been running for about 2 years now, and was very stable until Iupdated from 5.3 to 5.4, and now 6.0. The crashing seems to betotally random and I have had it crash in as little as 12 hours and aslong as 143 days.
When the box goes down it does so in a strange way. First, it stillresponds to network probes like ping (usually), however, all consoleaccess is ignored. Also, some network ports still respond, like atelnet to port 22 to test SSH will yield an SSH banner, but trying toconnect with SSH just hangs. Sometimes this is also true of the SMTPserver, but not always. This also makes it impossible for me to useCARP to swap to the recently purchased spare machine, since thenetwork interface is generally still responding so CARP does notdetect a problem.
My biggest problem with this is that there are *never* any consolemessages or log entries in any logs, no warnings about disk failure,buffer exhaustion, system failures, etc.. The machine simply seems tostop responding and the only way to correct the problem is a hard reboot.
A strange thing did happen yesterday though, I believe I caught thebox on the verge of failure. I was SSH'd in and did a ps to checkthings out. There were about 100 of these entries:
55050 ?? D 0:00.00 postmaster: ipa ipa ::1(63061) startup(postgres)
The box runs a web-based app and connects to a local Postgres DB whichseemed to be unable to start new connections being requested by thePHP scripts. At any rate, I stopped Apache and then tried to stopPostgres which resulted in (or just happened to coincide with) the boxlocking up and no longer responding to my SSH commands or attempts toreconnect with SSH. I hardly think this is a Postgres problem, buteven if it was, a userland app should *not* be able to bring down abox...
Can anyone shed some light on this, give me some options to try? Whathappened to kernel panics and such when there were serious errorsgoing on? The only glimmer of information I have is that *one* timethere was an error on the console about there not being any RAIDcontroller available. I did purchase a spare controller and I'm aboutto swap it out and see if it helps, but for some reason I doubt it.If a controller like that was failing, I would certainly hope to seesome serious error messages or panics going on.
I have been running FreeBSD since version 1.01 and have never had abox so unstable in the last 12 or so years, especially one that issupposed to be "server" quality instead of the make-shift ones I puttogether with desktop hardware. And last, I'm getting sick of myLinux admin friends telling me "told you so! should have runLinux...", please give me something to stick in their pie holes!
Thanks,
Matthew

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to"[EMAIL PROTECTED]"

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD Crash without Errors, Warnings, or Panics

Reply via email to