Multiple errors on server -- Where do I start looking?

Ryan Merrell Mon, 06 Feb 2012 08:36:41 -0800

I've run into some error messages on my server that are beyond my skill level 
of interpreting, so I'm hoping some of you can help me out. I've already posted 
this on the forums at 
http://forums.freebsd.org/showthread.php?p=165258#post165258 but since this is 
affecting our business, I'm trying to reach out to a broader audience and 
hopefully get this thing resolved.


We have an Intel modular blade server. The chassis has 2x 3-disk RAID(5) 
arrays. Volume 1 is what the OS (FreeBSD 7.2) is installed on and Volume 2 is 
mounted at /usr. These two volumes are da0 and da1.

I got email notifications saying the web host I run in a jail hosted on this 
server was down. I try to SSH into it, but it fails. I ping it and I get a 50% 
return rate. So I log in to the management blade and start a virtual KVM 
sessions to get into the blade. Once I'm into the basehost blade, I cat 
dmesg.today and get a slew of errors. Here we go..
(da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state
(da3:mpt0:0:6:1): Retrying Command (per Sense Data)
(da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
(da3:mpt0:0:6:1): CAM Status: SCSI Status Error
(da3:mpt0:0:6:1): SCSI Status: Check Condition
(da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b
(da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state
(da3:mpt0:0:6:1): Retrying Command (per Sense Data)
(da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
(da3:mpt0:0:6:1): CAM Status: SCSI Status Error
(da3:mpt0:0:6:1): SCSI Status: Check Condition
(da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b
(da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state
(da3:mpt0:0:6:1): Retries Exhausted

As mentioned before, our two volumes are da0 and da1. /dev lists da2 and da3 as 
well, but I have no idea what they are.  How do I figure out what da3 is and 
what do the above error messages say about it? Someone on the forum asked me if 
the two volumes are on the same controller and the answer is yes, they are.


GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1.
GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf.
GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a.
GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807.
Trying to mount root from ufs:/dev/da0s1a
GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed.
GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1.
GEOM_LABEL: Label ufsid/4aeb0387d999941a removed.
GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed.
GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a.
GEOM_LABEL: Label for provider da1s1 is ufsid/4bd2077f23a6cc93.
GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed.
GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807.
GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed.
GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf.
GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed.
GEOM_LABEL: Label ufsid/4aeb0387d999941a removed.
GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed.
GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed.
GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed.

Was root unmounted? Whats going on here? Obviously there's some issue with da0, 
which is mounted at /. The server has been up and running fine, so why am I 
seeing "Trying to mount root from ufs:/dev/da0s1a"?

pid 93248 (httpd), uid 80: exited on signal 10
pid 95624 (httpd), uid 80: exited on signal 10
pid 97956 (httpd), uid 80: exited on signal 10
pid 97935 (httpd), uid 80: exited on signal 10
pid 96603 (httpd), uid 80: exited on signal 10
pid 93210 (httpd), uid 80: exited on signal 10
pid 98246 (httpd), uid 80: exited on signal 10

This is apparently whats killing our webserver. Apache receives a signal 10 and 
quits.. Everything I've read says it's an issue with Apache trying to access 
RAM that it shouldn't or that doesn't exist.. Is there something else with the 
above da0 or da3 errors that would cause a SIGBUS on httpd?

Then after that it goes back and repeats that first block of da3 errors a bunch 
more times. The server was down for about 10 minutes and then it just fixed 
itself. It's weird because it seems the apache child processes all get killed 
off by the sigbus but the parent process doesn't.. so once the problem works 
itself out, it continues operations as normal without me having to restart the 
daemon or anything.

The management blade in the server chassis is reporting that all the hardware 
is fine. We have a second blade that boots off of a second partition in Volume 
1 and it doesn't have any problems at all.

I'm at a loss here!


Ryan Merrell


This e-mail message is for the sole use of the intended recipient(s) and may 
contain privileged or confidential information. Unauthorized use, distribution, 
review or disclosure is prohibited. If you are not the intended recipient, 
please notify the sender immediately by reply email and destroy all copies of 
the original message.
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Multiple errors on server -- Where do I start looking?

Reply via email to